Quality Reports
Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.
FastQC - Single Sample Reports
Basic Usage
Single file
fastqc sample.fastq.gz
Multiple files
fastqc *.fastq.gz
Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz
Set threads
fastqc -t 4 *.fastq.gz
Output Files
FastQC produces two files per input:
-
sample_fastqc.html
-
Interactive HTML report
-
sample_fastqc.zip
-
Data files and images
Key Modules
Module What It Shows Warning Signs
Per base sequence quality Quality scores across read Drop below Q20 at 3' end
Per sequence quality Quality score distribution Bimodal distribution
Per base sequence content Nucleotide composition Imbalance at start (normal)
Per sequence GC content GC distribution Secondary peak (contamination)
Per base N content Unknown bases High N content
Sequence length distribution Read lengths Unexpected variation
Sequence duplication Duplicate reads High duplication (PCR)
Overrepresented sequences Common sequences Adapter contamination
Adapter content Adapter sequences Visible adapter curves
Extract Data from ZIP
Unzip to access raw data
unzip sample_fastqc.zip
View summary
cat sample_fastqc/summary.txt
Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"
MultiQC - Aggregate Reports
Basic Usage
Aggregate all FastQC reports in current directory
multiqc .
Specify input and output
multiqc qc_reports/ -o multiqc_output/
Custom report name
multiqc . -n my_project_qc
Force overwrite
multiqc . -f
Common Options
Flat directory (no sample subdirs)
multiqc --flat .
Export data as TSV
multiqc . --export
Only specific modules
multiqc . -m fastqc
Exclude patterns
multiqc . --ignore '_trimmed'
Include patterns
multiqc . --ignore-samples 'negative'
Output Files
-
multiqc_report.html
-
Interactive HTML report
-
multiqc_data/
-
Directory with data tables
-
multiqc_fastqc.txt
-
FastQC metrics
-
multiqc_general_stats.txt
-
Summary statistics
-
multiqc_sources.txt
-
Source files used
Extract Data Programmatically
import pandas as pd
general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t') print(general_stats.columns)
fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')
Batch Processing
Process Multiple Samples
All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz
Then aggregate
multiqc qc_reports/ -o multiqc_output/
Before and After Trimming
Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed
QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz
After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz
Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/
Interpretation Guide
Quality Scores
Phred Score Error Rate Interpretation
Q40 0.0001 Excellent
Q30 0.001 Good (Illumina target)
Q20 0.01 Acceptable
Q10 0.1 Poor
Common Issues
Issue Likely Cause Action
Low quality at 3' end Normal degradation Trim 3' end
Adapter contamination Short inserts Trim adapters
GC bias Library prep Consider correction
High duplication Low complexity, PCR Mark/remove duplicates
Overrepresented seqs Adapters, primers Check sequences
Configuration
Custom Adapters
Create ~/.fastqc/Configuration/adapter_list.txt :
Custom_Adapter_Name ACGTACGTACGT
Custom Limits
Create ~/.fastqc/Configuration/limits.txt to customize thresholds:
Warn if mean quality below 25
quality_sequence warn 25 quality_sequence error 20
Related Skills
-
adapter-trimming - Remove adapters detected by FastQC
-
fastp-workflow - All-in-one QC and trimming
-
sequence-io - FASTQ file reading/writing