TF Footprinting
TOBIAS Workflow
1. Correct Tn5 bias
tobias ATACorrect
--bam sample.bam
--genome genome.fa
--peaks peaks.bed
--outdir corrected/
--cores 8
2. Calculate footprint scores
tobias FootprintScores
--signal corrected/sample_corrected.bw
--regions peaks.bed
--output footprints.bw
--cores 8
3. Bind TF motifs
tobias BINDetect
--motifs JASPAR_motifs.pfm
--signals footprints.bw
--genome genome.fa
--peaks peaks.bed
--outdir bindetect_output/
--cores 8
TOBIAS Differential Footprinting
Compare conditions
tobias BINDetect
--motifs JASPAR_motifs.pfm
--signals condition1.bw condition2.bw
--genome genome.fa
--peaks consensus_peaks.bed
--outdir differential_footprints/
--cond_names condition1 condition2
--cores 8
Output includes:
- Differential binding scores
- Per-TF statistics
- Bound/unbound site predictions
Download JASPAR Motifs
Download JASPAR motifs
wget https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt mv JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt JASPAR_motifs.pfm
Prepare Input Files
Ensure BAM is sorted and indexed
samtools sort -@ 8 sample.bam -o sample.sorted.bam samtools index sample.sorted.bam
Filter peaks (remove blacklist, size filter)
bedtools intersect -v -a peaks.narrowPeak -b blacklist.bed |
awk '$3-$2 >= 100 && $3-$2 <= 5000' > filtered_peaks.bed
HINT-ATAC Alternative
RGT suite HINT-ATAC
rgt-hint footprinting
--atac-seq
--organism hg38
--output-prefix sample
sample.bam peaks.bed
PIQ Footprinting
PIQ (another footprinting tool)
library(PIQ)
Load data
bam <- 'sample.bam' pwms <- readMotifs('JASPAR_motifs.pfm')
Run footprinting
piq_results <- piq(bam, pwms, genome='hg38')
Aggregate Footprint Plots
TOBIAS PlotAggregate
tobias PlotAggregate
--TFBS bindetect_output//beds/_bound.bed
--signals corrected/sample_corrected.bw
--output aggregate_footprints.pdf
--share_y
--plot_boundaries
Python: Custom Footprint Analysis
import pyBigWig import numpy as np import pandas as pd from pyfaidx import Fasta
def extract_footprint_signal(bigwig_file, bed_file, flank=100): '''Extract signal around binding sites.''' bw = pyBigWig.open(bigwig_file)
signals = []
for line in open(bed_file):
fields = line.strip().split('\t')
chrom, start, end = fields[0], int(fields[1]), int(fields[2])
center = (start + end) // 2
try:
vals = bw.values(chrom, center - flank, center + flank)
if vals:
signals.append(vals)
except:
continue
avg_signal = np.nanmean(signals, axis=0)
return avg_signal
def plot_footprint(signal, output_file): '''Plot aggregate footprint.''' import matplotlib.pyplot as plt
x = np.arange(-len(signal)//2, len(signal)//2)
plt.figure(figsize=(8, 4))
plt.plot(x, signal, 'b-', linewidth=2)
plt.axvline(0, color='red', linestyle='--', alpha=0.5)
plt.xlabel('Distance from motif center (bp)')
plt.ylabel('ATAC-seq signal')
plt.title('Aggregate Footprint')
plt.savefig(output_file, dpi=150)
plt.close()
Scan for Motifs
Find motif occurrences in peaks
Using FIMO (MEME suite)
fimo --oc fimo_output motifs.meme peaks.fa
Or HOMER
findMotifsGenome.pl peaks.bed hg38 motif_analysis/ -find motif.motif
Interpret Footprint Depth
Footprint Depth Interpretation
Deep footprint Strong TF binding
Shallow footprint Weak/transient binding
No footprint No binding or wrong motif
Shoulders only Nucleosome positioning
Quality Considerations
Footprinting requires:
- High read depth (>50M reads)
- NFR-enriched signal (filter for <100bp fragments)
- Good Tn5 bias correction
Extract NFR reads
samtools view -h sample.bam |
awk 'substr($0,1,1)=="@" || ($9>0 && $9<100) || ($9<0 && $9>-100)' |
samtools view -b > nfr.bam
Differential TF Activity
def compare_footprints(tf_name, cond1_bw, cond2_bw, motif_bed): '''Compare TF footprints between conditions.''' sig1 = extract_footprint_signal(cond1_bw, motif_bed) sig2 = extract_footprint_signal(cond2_bw, motif_bed)
# Calculate footprint depth
depth1 = np.nanmean(sig1[:30]) - np.nanmin(sig1[40:60])
depth2 = np.nanmean(sig2[:30]) - np.nanmin(sig2[40:60])
diff = depth2 - depth1
return {
'TF': tf_name,
'depth_cond1': depth1,
'depth_cond2': depth2,
'difference': diff
}
TOBIAS Output Files
File Description
*_corrected.bw Bias-corrected signal
*_footprints.bw Footprint scores
*_bound.bed Predicted bound sites
*_unbound.bed Predicted unbound sites
*_overview.txt Per-TF statistics
Related Skills
-
atac-seq/atac-peak-calling - Generate peaks
-
atac-seq/atac-qc - Verify data quality
-
chip-seq/peak-annotation - Annotate binding sites
-
sequence-manipulation/motif-search - Find motifs