deepTools: NGS Data Analysis Toolkit
Overview
deepTools is a comprehensive suite of Python command-line tools designed for processing and analyzing high-throughput sequencing data. Use deepTools to perform quality control, normalize data, compare samples, and generate publication-quality visualizations for ChIP-seq, RNA-seq, ATAC-seq, MNase-seq, and other NGS experiments.
Core capabilities:
-
Convert BAM alignments to normalized coverage tracks (bigWig/bedGraph)
-
Quality control assessment (fingerprint, correlation, coverage)
-
Sample comparison and correlation analysis
-
Heatmap and profile plot generation around genomic features
-
Enrichment analysis and peak region visualization
When to Use This Skill
This skill should be used when:
-
File conversion: "Convert BAM to bigWig", "generate coverage tracks", "normalize ChIP-seq data"
-
Quality control: "check ChIP quality", "compare replicates", "assess sequencing depth", "QC analysis"
-
Visualization: "create heatmap around TSS", "plot ChIP signal", "visualize enrichment", "generate profile plot"
-
Sample comparison: "compare treatment vs control", "correlate samples", "PCA analysis"
-
Analysis workflows: "analyze ChIP-seq data", "RNA-seq coverage", "ATAC-seq analysis", "complete workflow"
-
Working with specific file types: BAM files, bigWig files, BED region files in genomics context
Quick Start
For users new to deepTools, start with file validation and common workflows:
- Validate Input Files
Before running any analysis, validate BAM, bigWig, and BED files using the validation script:
python scripts/validate_files.py --bam sample1.bam sample2.bam --bed regions.bed
This checks file existence, BAM indices, and format correctness.
- Generate Workflow Template
For standard analyses, use the workflow generator to create customized scripts:
List available workflows
python scripts/workflow_generator.py --list
Generate ChIP-seq QC workflow
python scripts/workflow_generator.py chipseq_qc -o qc_workflow.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398
Make executable and run
chmod +x qc_workflow.sh ./qc_workflow.sh
- Most Common Operations
See assets/quick_reference.md for frequently used commands and parameters.
Installation
uv pip install deeptools
Core Workflows
deepTools workflows typically follow this pattern: QC → Normalization → Comparison/Visualization
ChIP-seq Quality Control Workflow
When users request ChIP-seq QC or quality assessment:
-
Generate workflow script using scripts/workflow_generator.py chipseq_qc
-
Key QC steps:
-
Sample correlation (multiBamSummary + plotCorrelation)
-
PCA analysis (plotPCA)
-
Coverage assessment (plotCoverage)
-
Fragment size validation (bamPEFragmentSize)
-
ChIP enrichment strength (plotFingerprint)
Interpreting results:
-
Correlation: Replicates should cluster together with high correlation (>0.9)
-
Fingerprint: Strong ChIP shows steep rise; flat diagonal indicates poor enrichment
-
Coverage: Assess if sequencing depth is adequate for analysis
Full workflow details in references/workflows.md → "ChIP-seq Quality Control Workflow"
ChIP-seq Complete Analysis Workflow
For full ChIP-seq analysis from BAM to visualizations:
-
Generate coverage tracks with normalization (bamCoverage)
-
Create comparison tracks (bamCompare for log2 ratio)
-
Compute signal matrices around features (computeMatrix)
-
Generate visualizations (plotHeatmap, plotProfile)
-
Enrichment analysis at peaks (plotEnrichment)
Use scripts/workflow_generator.py chipseq_analysis to generate template.
Complete command sequences in references/workflows.md → "ChIP-seq Analysis Workflow"
RNA-seq Coverage Workflow
For strand-specific RNA-seq coverage tracks:
Use bamCoverage with --filterRNAstrand to separate forward and reverse strands.
Important: NEVER use --extendReads for RNA-seq (would extend over splice junctions).
Use normalization: CPM for fixed bins, RPKM for gene-level analysis.
Template available: scripts/workflow_generator.py rnaseq_coverage
Details in references/workflows.md → "RNA-seq Coverage Workflow"
ATAC-seq Analysis Workflow
ATAC-seq requires Tn5 offset correction:
-
Shift reads using alignmentSieve with --ATACshift
-
Generate coverage with bamCoverage
-
Analyze fragment sizes (expect nucleosome ladder pattern)
-
Visualize at peaks if available
Template: scripts/workflow_generator.py atacseq
Full workflow in references/workflows.md → "ATAC-seq Workflow"
Tool Categories and Common Tasks
BAM/bigWig Processing
Convert BAM to normalized coverage:
bamCoverage --bam input.bam --outFileName output.bw
--normalizeUsing RPGC --effectiveGenomeSize 2913022398
--binSize 10 --numberOfProcessors 8
Compare two samples (log2 ratio):
bamCompare -b1 treatment.bam -b2 control.bam -o ratio.bw
--operation log2 --scaleFactorsMethod readCount
Key tools: bamCoverage, bamCompare, multiBamSummary, multiBigwigSummary, correctGCBias, alignmentSieve
Complete reference: references/tools_reference.md → "BAM and bigWig File Processing Tools"
Quality Control
Check ChIP enrichment:
plotFingerprint -b input.bam chip.bam -o fingerprint.png
--extendReads 200 --ignoreDuplicates
Sample correlation:
multiBamSummary bins --bamfiles *.bam -o counts.npz
plotCorrelation -in counts.npz --corMethod pearson
--whatToShow heatmap -o correlation.png
Key tools: plotFingerprint, plotCoverage, plotCorrelation, plotPCA, bamPEFragmentSize
Complete reference: references/tools_reference.md → "Quality Control Tools"
Visualization
Create heatmap around TSS:
Compute matrix
computeMatrix reference-point -S signal.bw -R genes.bed
-b 3000 -a 3000 --referencePoint TSS -o matrix.gz
Generate heatmap
plotHeatmap -m matrix.gz -o heatmap.png
--colorMap RdBu --kmeans 3
Create profile plot:
plotProfile -m matrix.gz -o profile.png
--plotType lines --colors blue red
Key tools: computeMatrix, plotHeatmap, plotProfile, plotEnrichment
Complete reference: references/tools_reference.md → "Visualization Tools"
Normalization Methods
Choosing the correct normalization is critical for valid comparisons. Consult references/normalization_methods.md for comprehensive guidance.
Quick selection guide:
-
ChIP-seq coverage: Use RPGC or CPM
-
ChIP-seq comparison: Use bamCompare with log2 and readCount
-
RNA-seq bins: Use CPM
-
RNA-seq genes: Use RPKM (accounts for gene length)
-
ATAC-seq: Use RPGC or CPM
Normalization methods:
-
RPGC: 1× genome coverage (requires --effectiveGenomeSize)
-
CPM: Counts per million mapped reads
-
RPKM: Reads per kb per million (accounts for region length)
-
BPM: Bins per million
-
None: Raw counts (not recommended for comparisons)
Full explanation: references/normalization_methods.md
Effective Genome Sizes
RPGC normalization requires effective genome size. Common values:
Organism Assembly Size Usage
Human GRCh38/hg38 2,913,022,398 --effectiveGenomeSize 2913022398
Mouse GRCm38/mm10 2,652,783,500 --effectiveGenomeSize 2652783500
Zebrafish GRCz11 1,368,780,147 --effectiveGenomeSize 1368780147
Drosophila dm6 142,573,017 --effectiveGenomeSize 142573017
C. elegans ce10/ce11 100,286,401 --effectiveGenomeSize 100286401
Complete table with read-length-specific values: references/effective_genome_sizes.md
Common Parameters Across Tools
Many deepTools commands share these options:
Performance:
-
--numberOfProcessors, -p : Enable parallel processing (always use available cores)
-
--region : Process specific regions for testing (e.g., chr1:1-1000000 )
Read Filtering:
-
--ignoreDuplicates : Remove PCR duplicates (recommended for most analyses)
-
--minMappingQuality : Filter by alignment quality (e.g., --minMappingQuality 10 )
-
--minFragmentLength / --maxFragmentLength : Fragment length bounds
-
--samFlagInclude / --samFlagExclude : SAM flag filtering
Read Processing:
-
--extendReads : Extend to fragment length (ChIP-seq: YES, RNA-seq: NO)
-
--centerReads : Center at fragment midpoint for sharper signals
Best Practices
File Validation
Always validate files first using scripts/validate_files.py to check:
-
File existence and readability
-
BAM indices present (.bai files)
-
BED format correctness
-
File sizes reasonable
Analysis Strategy
-
Start with QC: Run correlation, coverage, and fingerprint analysis before proceeding
-
Test on small regions: Use --region chr1:1-10000000 for parameter testing
-
Document commands: Save full command lines for reproducibility
-
Use consistent normalization: Apply same method across samples in comparisons
-
Verify genome assembly: Ensure BAM and BED files use matching genome builds
ChIP-seq Specific
-
Always extend reads for ChIP-seq: --extendReads 200
-
Remove duplicates: Use --ignoreDuplicates in most cases
-
Check enrichment first: Run plotFingerprint before detailed analysis
-
GC correction: Only apply if significant bias detected; never use --ignoreDuplicates after GC correction
RNA-seq Specific
-
Never extend reads for RNA-seq (would span splice junctions)
-
Strand-specific: Use --filterRNAstrand forward/reverse for stranded libraries
-
Normalization: CPM for bins, RPKM for genes
ATAC-seq Specific
-
Apply Tn5 correction: Use alignmentSieve with --ATACshift
-
Fragment filtering: Set appropriate min/max fragment lengths
-
Check nucleosome pattern: Fragment size plot should show ladder pattern
Performance Optimization
-
Use multiple processors: --numberOfProcessors 8 (or available cores)
-
Increase bin size for faster processing and smaller files
-
Process chromosomes separately for memory-limited systems
-
Pre-filter BAM files using alignmentSieve to create reusable filtered files
-
Use bigWig over bedGraph: Compressed and faster to process
Troubleshooting
Common Issues
BAM index missing:
samtools index input.bam
Out of memory: Process chromosomes individually using --region :
bamCoverage --bam input.bam -o chr1.bw --region chr1
Slow processing: Increase --numberOfProcessors and/or increase --binSize
bigWig files too large: Increase bin size: --binSize 50 or larger
Validation Errors
Run validation script to identify issues:
python scripts/validate_files.py --bam *.bam --bed regions.bed
Common errors and solutions explained in script output.
Reference Documentation
This skill includes comprehensive reference documentation:
references/tools_reference.md
Complete documentation of all deepTools commands organized by category:
-
BAM and bigWig processing tools (9 tools)
-
Quality control tools (6 tools)
-
Visualization tools (3 tools)
-
Miscellaneous tools (2 tools)
Each tool includes:
-
Purpose and overview
-
Key parameters with explanations
-
Usage examples
-
Important notes and best practices
Use this reference when: Users ask about specific tools, parameters, or detailed usage.
references/workflows.md
Complete workflow examples for common analyses:
-
ChIP-seq quality control workflow
-
ChIP-seq complete analysis workflow
-
RNA-seq coverage workflow
-
ATAC-seq analysis workflow
-
Multi-sample comparison workflow
-
Peak region analysis workflow
-
Troubleshooting and performance tips
Use this reference when: Users need complete analysis pipelines or workflow examples.
references/normalization_methods.md
Comprehensive guide to normalization methods:
-
Detailed explanation of each method (RPGC, CPM, RPKM, BPM, etc.)
-
When to use each method
-
Formulas and interpretation
-
Selection guide by experiment type
-
Common pitfalls and solutions
-
Quick reference table
Use this reference when: Users ask about normalization, comparing samples, or which method to use.
references/effective_genome_sizes.md
Effective genome size values and usage:
-
Common organism values (human, mouse, fly, worm, zebrafish)
-
Read-length-specific values
-
Calculation methods
-
When and how to use in commands
-
Custom genome calculation instructions
Use this reference when: Users need genome size for RPGC normalization or GC bias correction.
Helper Scripts
scripts/validate_files.py
Validates BAM, bigWig, and BED files for deepTools analysis. Checks file existence, indices, and format.
Usage:
python scripts/validate_files.py --bam sample1.bam sample2.bam
--bed peaks.bed --bigwig signal.bw
When to use: Before starting any analysis, or when troubleshooting errors.
scripts/workflow_generator.py
Generates customizable bash script templates for common deepTools workflows.
Available workflows:
-
chipseq_qc : ChIP-seq quality control
-
chipseq_analysis : Complete ChIP-seq analysis
-
rnaseq_coverage : Strand-specific RNA-seq coverage
-
atacseq : ATAC-seq with Tn5 correction
Usage:
List workflows
python scripts/workflow_generator.py --list
Generate workflow
python scripts/workflow_generator.py chipseq_qc -o qc.sh
--input-bam Input.bam --chip-bams "ChIP1.bam ChIP2.bam"
--genome-size 2913022398 --threads 8
Run generated workflow
chmod +x qc.sh ./qc.sh
When to use: Users request standard workflows or need template scripts to customize.
Assets
assets/quick_reference.md
Quick reference card with most common commands, effective genome sizes, and typical workflow pattern.
When to use: Users need quick command examples without detailed documentation.
Handling User Requests
For New Users
-
Start with installation verification
-
Validate input files using scripts/validate_files.py
-
Recommend appropriate workflow based on experiment type
-
Generate workflow template using scripts/workflow_generator.py
-
Guide through customization and execution
For Experienced Users
-
Provide specific tool commands for requested operations
-
Reference appropriate sections in references/tools_reference.md
-
Suggest optimizations and best practices
-
Offer troubleshooting for issues
For Specific Tasks
"Convert BAM to bigWig":
-
Use bamCoverage with appropriate normalization
-
Recommend RPGC or CPM based on use case
-
Provide effective genome size for organism
-
Suggest relevant parameters (extendReads, ignoreDuplicates, binSize)
"Check ChIP quality":
-
Run full QC workflow or use plotFingerprint specifically
-
Explain interpretation of results
-
Suggest follow-up actions based on results
"Create heatmap":
-
Guide through two-step process: computeMatrix → plotHeatmap
-
Help choose appropriate matrix mode (reference-point vs scale-regions)
-
Suggest visualization parameters and clustering options
"Compare samples":
-
Recommend bamCompare for two-sample comparison
-
Suggest multiBamSummary + plotCorrelation for multiple samples
-
Guide normalization method selection
Referencing Documentation
When users need detailed information:
-
Tool details: Direct to specific sections in references/tools_reference.md
-
Workflows: Use references/workflows.md for complete analysis pipelines
-
Normalization: Consult references/normalization_methods.md for method selection
-
Genome sizes: Reference references/effective_genome_sizes.md
Search references using grep patterns:
Find tool documentation
grep -A 20 "^### toolname" references/tools_reference.md
Find workflow
grep -A 50 "^## Workflow Name" references/workflows.md
Find normalization method
grep -A 15 "^### Method Name" references/normalization_methods.md
Example Interactions
User: "I need to analyze my ChIP-seq data"
Response approach:
-
Ask about files available (BAM files, peaks, genes)
-
Validate files using validation script
-
Generate chipseq_analysis workflow template
-
Customize for their specific files and organism
-
Explain each step as script runs
User: "Which normalization should I use?"
Response approach:
-
Ask about experiment type (ChIP-seq, RNA-seq, etc.)
-
Ask about comparison goal (within-sample or between-sample)
-
Consult references/normalization_methods.md selection guide
-
Recommend appropriate method with justification
-
Provide command example with parameters
User: "Create a heatmap around TSS"
Response approach:
-
Verify bigWig and gene BED files available
-
Use computeMatrix with reference-point mode at TSS
-
Generate plotHeatmap with appropriate visualization parameters
-
Suggest clustering if dataset is large
-
Offer profile plot as complement
Key Reminders
-
File validation first: Always validate input files before analysis
-
Normalization matters: Choose appropriate method for comparison type
-
Extend reads carefully: YES for ChIP-seq, NO for RNA-seq
-
Use all cores: Set --numberOfProcessors to available cores
-
Test on regions: Use --region for parameter testing
-
Check QC first: Run quality control before detailed analysis
-
Document everything: Save commands for reproducibility
-
Reference documentation: Use comprehensive references for detailed guidance