Methylation Pipeline
Complete workflow from bisulfite sequencing FASTQ to differentially methylated regions.
Workflow Overview
FASTQ files | v [1. QC & Trimming] -----> fastp/Trim Galore | v [2. Alignment] ---------> Bismark | v [3. Deduplication] -----> deduplicate_bismark | v [4. Methylation Calling] -> bismark_methylation_extractor | v [5. Analysis] -----------> methylKit (R) | v [6. DMR Detection] ------> methylKit/DSS | v Differentially methylated regions
Primary Path: Bismark + methylKit
Step 1: Quality Control
Trim Galore recommended for bisulfite data (handles adapter bias)
trim_galore --paired --fastqc
-o trimmed/
sample_R1.fastq.gz sample_R2.fastq.gz
Or fastp with conservative settings
fastp -i sample_R1.fastq.gz -I sample_R2.fastq.gz
-o trimmed/sample_R1.fq.gz -O trimmed/sample_R2.fq.gz
--detect_adapter_for_pe
--qualified_quality_phred 20
--length_required 35
--html qc/sample_fastp.html
Step 2: Bismark Alignment
Prepare genome (once)
bismark_genome_preparation --bowtie2 genome/
Align
bismark --genome genome/
-1 trimmed/sample_R1_val_1.fq.gz
-2 trimmed/sample_R2_val_2.fq.gz
-o aligned/
--parallel 4
--temp_dir tmp/
Output: sample_R1_val_1_bismark_bt2_pe.bam
QC Checkpoint: Check Bismark report
-
Mapping efficiency >50% (BS-seq has lower rates)
-
Bisulfite conversion rate >99%
Step 3: Deduplication
deduplicate_bismark
--bam
-p
-o deduplicated/
aligned/sample_R1_val_1_bismark_bt2_pe.bam
Step 4: Methylation Calling
bismark_methylation_extractor
--paired-end
--comprehensive
--bedGraph
--cytosine_report
--genome_folder genome/
-o methylation/
deduplicated/sample_R1_val_1_bismark_bt2_pe.deduplicated.bam
Generate summary report
bismark2report bismark2summary
Step 5: Analysis with methylKit
library(methylKit)
Read methylation calls
files <- list( 'methylation/control_1.CpG_report.txt', 'methylation/control_2.CpG_report.txt', 'methylation/treated_1.CpG_report.txt', 'methylation/treated_2.CpG_report.txt' )
sample_ids <- c('control_1', 'control_2', 'treated_1', 'treated_2') treatment <- c(0, 0, 1, 1)
Read cytosine reports
meth_obj <- methRead( location = as.list(files), sample.id = as.list(sample_ids), assembly = 'hg38', treatment = treatment, context = 'CpG', pipeline = 'bismarkCytosineReport' )
Filter by coverage
meth_filtered <- filterByCoverage(meth_obj, lo.count = 10, hi.perc = 99.9)
Normalize coverage
meth_norm <- normalizeCoverage(meth_filtered)
Merge samples (keep sites covered in all)
meth_merged <- unite(meth_norm, destrand = TRUE)
Sample statistics
getMethylationStats(meth_obj[[1]], plot = TRUE) getCoverageStats(meth_obj[[1]], plot = TRUE)
Step 6: DMR Detection
Calculate differential methylation (per CpG)
diff_meth <- calculateDiffMeth(meth_merged)
Get significant DMCs
dmc <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01)
Tile into regions (DMRs)
tiles <- tileMethylCounts(meth_merged, win.size = 1000, step.size = 1000) diff_tiles <- calculateDiffMeth(tiles) dmr <- getMethylDiff(diff_tiles, difference = 25, qvalue = 0.01)
Export
write.csv(as.data.frame(dmc), 'dmc_results.csv') write.csv(as.data.frame(dmr), 'dmr_results.csv')
Annotate with genomic features
library(genomation) gene_obj <- readTranscriptFeatures('genes.bed') annotateWithGeneParts(as(dmr, 'GRanges'), gene_obj)
Parameter Recommendations
Step Parameter Value
Trim Galore default Recommended for BS-seq
Bismark --parallel 4 (per sample parallelization)
methylKit lo.count 10 (minimum coverage)
methylKit difference 25 (% methylation difference)
methylKit qvalue 0.01
DMR tiles win.size 500-1000 bp
Troubleshooting
Issue Likely Cause Solution
Low mapping rate Normal for BS-seq Expect 40-70%
Low conversion Failed bisulfite treatment Check spike-in controls
Few DMRs Low coverage, small differences Increase sequencing, relax thresholds
Biased positions M-bias Trim 10bp from read ends
Complete Pipeline Script
#!/bin/bash set -e
THREADS=4 GENOME="genome/" SAMPLES="control_1 control_2 treated_1 treated_2" OUTDIR="methylation_results"
mkdir -p ${OUTDIR}/{trimmed,aligned,deduplicated,methylation,qc}
Step 1: QC
for sample in $SAMPLES; do
trim_galore --paired --fastqc -o ${OUTDIR}/trimmed/
${sample}_R1.fastq.gz ${sample}_R2.fastq.gz
done
Step 2: Alignment
for sample in $SAMPLES; do
bismark --genome ${GENOME}
-1 ${OUTDIR}/trimmed/${sample}_R1_val_1.fq.gz
-2 ${OUTDIR}/trimmed/${sample}_R2_val_2.fq.gz
-o ${OUTDIR}/aligned/
--parallel ${THREADS} --temp_dir tmp/
done
Step 3: Deduplication
for sample in $SAMPLES; do
deduplicate_bismark --bam -p
-o ${OUTDIR}/deduplicated/
${OUTDIR}/aligned/${sample}_R1_val_1_bismark_bt2_pe.bam
done
Step 4: Methylation calling
for sample in $SAMPLES; do
bismark_methylation_extractor --paired-end --comprehensive
--bedGraph --cytosine_report
--genome_folder ${GENOME}
-o ${OUTDIR}/methylation/
${OUTDIR}/deduplicated/${sample}_R1_val_1_bismark_bt2_pe.deduplicated.bam
done
bismark2report echo "Pipeline complete. Run R script for DMR analysis."
Related Skills
-
methylation-analysis/bismark-alignment - Bismark parameters
-
methylation-analysis/methylation-calling - Calling details
-
methylation-analysis/methylkit-analysis - methylKit functions
-
methylation-analysis/dmr-detection - DMR algorithms