deepTools: NGS Data Analysis Toolkit
Overview
deepTools is a comprehensive suite of Python command-line tools designed for processing and analyzing high-throughput sequencing data. Use deepTools to perform quality control, normalize data, compare samples, and generate publication-quality visualizations for ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other NGS experiments.
Core capabilities:
- Convert BAM alignments to normalized coverage tracks (bigWig/bedGraph)
- Quality control assessment (fingerprint, correlation, coverage)
- Sample comparison and correlation analysis
- Heatmap and profile plot generation around genomic features
- Enrichment analysis and peak region visualization
When to Use This Skill
This skill should be used when:
- File conversion: "Convert BAM to bigWig", "generate coverage tracks", "normalize ChIP-seq data"
- Quality control: "check ChIP quality", "compare replicates", "assess sequencing depth", "QC analysis"
- Visualization: "create heatmap around TSS", "plot ChIP signal", "visualize enrichment", "generate profile plot"
- Sample comparison: "compare treatment vs control", "correlate samples", "PCA analysis"
- Analysis workflows: "analyze ChIP-seq data", "RNA-seq coverage", "ATAC-seq analysis", "complete workflow"
- Working with specific file types: BAM files, bigWig files, BED region files in genomics context
Quick Start
For users new to deepTools, start with file validation and common workflows:
1. Validate Input Files
Before running any analysis, validate BAM, bigWig, and BED files using the validation script:
python scripts/validate_files.py --bam sample1.bam sample2.bam --bed regions.bed
This checks file existence, BAM indices, and format correctness.
2. Generate Workflow Template
For standard analyses, use the workflow generator to create customized scripts:
# List available workflows
python scripts/workflow_generator.py --list
# Generate ChIP-seq QC workflow
python scripts/workflow_generator.py chipseq_qc -o qc_workflow.sh \
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam" \
--genome-size 2913022398
# Make executable and run
chmod +x qc_workflow.sh
./qc_workflow.sh
3. Most Common Operations
See assets/quick_reference.md for frequently used commands and parameters.
Installation
uv pip install deeptools
Core Workflows
deepTools workflows typically follow this pattern: QC → Normalization → Comparison/Visualization
ChIP-seq Quality Control Workflow
When users request ChIP-seq QC or quality assessment:
- Generate workflow script using
scripts/workflow_generator.py chipseq_qc - Key QC steps:
- Sample correlation (multiBamSummary + plotCorrelation)
- PCA analysis (plotPCA)
- Coverage assessment (plotCoverage)
- Fragment size validation (bamPEFragmentSize)
- ChIP enrichment strength (plotFingerprint)
Interpreting results:
- Correlation: Replicates should cluster together with high correlation (>0.9)
- Fingerprint: Strong ChIP shows steep rise; flat diagonal indicates poor enrichment
- Coverage: Assess if sequencing depth is adequate for analysis
Full workflow details in references/workflows.md → "ChIP-seq Quality Control Workflow"
ChIP-seq Complete Analysis Workflow
For full ChIP-seq analysis from BAM to visualizations:
- Generate coverage tracks with normalization (bamCoverage)
- Create comparison tracks (bamCompare for log2 ratio)
- Compute signal matrices around features (computeMatrix)
- Generate visualizations (plotHeatmap, plotProfile)
- Enrichment analysis at peaks (plotEnrichment)
Use scripts/workflow_generator.py chipseq_analysis to generate template.
Complete command sequences in references/workflows.md → "ChIP-seq Analysis Workflow"
RNA-seq Coverage Workflow
For strand-specific RNA-seq coverage tracks:
Use bamCoverage with --filterRNAstrand to separate forward and reverse strands.
Important: NEVER use --extendReads for RNA-seq (would extend over splice junctions).
Use normalization: CPM for fixed bins, RPKM for gene-level analysis.
Template available: scripts/workflow_generator.py rnaseq_coverage
Details in references/workflows.md → "RNA-seq Coverage Workflow"
ATAC-seq Analysis Workflow
ATAC-seq requires Tn5 offset correction:
- Shift reads using alignmentSieve with
--ATACshift - Generate coverage with bamCoverage
- Analyze fragment sizes (expect nucleosome ladder pattern)
- Visualize at peaks if available
Template: scripts/workflow_generator.py atacseq
Full workflow in references/workflows.md → "ATAC-seq Workflow"
Tool Categories and Common Tasks
BAM/bigWig Processing
Convert BAM to normalized coverage:
bamCoverage --bam input.bam --outFileName output.bw \
--normalizeUsing RPGC --effectiveGenomeSize 2913022398 \
--binSize 10 --numberOfProcessors 8
Compare two samples (log2 ratio):
bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw \
--operation log2 --scaleFactorsMethod readCount
Key tools: bamCoverage, bamCompare, multiBamSummary, multiBigwigSummary, correctGCBias, alignmentSieve
Complete reference: references/tools_reference.md → "BAM and bigWig File Processing Tools"
Quality Control
Check ChIP enrichment:
plotFingerprint -b input.bam chip.bam -o fingerprint.png \
--extendReads 200 --ignoreDuplicates
Sample correlation:
multiBamSummary bins --bamfiles *.bam -o counts.npz
plotCorrelation -in counts.npz --corMethod pearson \
--whatToShow heatmap -o correlation.png
Key tools: plotFingerprint, plotCoverage, plotCorrelation, plotPCA, bamPEFragmentSize
Complete reference: references/tools_reference.md → "Quality Control Tools"
Visualization
Create heatmap around TSS:
# Compute matrix
computeMatrix reference-point -S signal.bw -R genes.bed \
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
# Generate heatmap
plotHeatmap -m matrix.gz -o heatmap.png \
--colorMap RdBu --kmeans 3
Create profile plot:
plotProfile -m matrix.gz -o profile.png \
--plotType lines --colors blue red
Key tools: computeMatrix, plotHeatmap, plotProfile, plotEnrichment
Complete reference: references/tools_reference.md → "Visualization Tools"
Normalization Methods
Choosing the correct normalization is critical for valid comparisons. Consult references/normalization_methods.md for comprehensive guidance.
Quick selection guide:
- ChIP-seq coverage: Use RPGC or CPM
- ChIP-seq comparison: Use bamCompare with log2 and readCount
- RNA-seq bins: Use CPM
- RNA-seq genes: Use RPKM (accounts for gene length)
- ATAC-seq: Use RPGC or CPM
Normalization methods:
- RPGC: 1× genome coverage (requires --effectiveGenomeSize)
- CPM: Counts per million mapped reads
- RPKM: Reads per kb per million (accounts for region length)
- BPM: Bins per million
- None: Raw counts (not recommended for comparisons)
Full explanation: references/normalization_methods.md
Effective Genome Sizes
RPGC normalization requires effective genome size. Common values:
| Organism | Assembly | Size | Usage |
|---|---|---|---|
| Human | GRCh38/hg38 | 2,913,022,398 | --effectiveGenomeSize 2913022398 |
| Mouse | GRCm38/mm10 | 2,652,783,500 | --effectiveGenomeSize 2652783500 |
| Zebrafish | GRCz11 | 1,368,780,147 | --effectiveGenomeSize 1368780147 |
| Drosophila | dm6 | 142,573,017 | --effectiveGenomeSize 142573017 |
| C. elegans | ce10/ce11 | 100,286,401 | --effectiveGenomeSize 100286401 |
Complete table with read-length-specific values: references/effective_genome_sizes.md
Common Parameters Across Tools
Many deepTools commands share these options:
Performance:
--numberOfProcessors, -p: Enable parallel processing (always