bio-read-alignment-bwa-alignment

# Index reference genome (required once) bwa-mem2 index reference.fa

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-read-alignment-bwa-alignment" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-read-alignment-bwa-alignment

BWA-MEM2 Alignment

Build Index

Index reference genome (required once)

bwa-mem2 index reference.fa

Creates: reference.fa.0123, reference.fa.amb, reference.fa.ann, reference.fa.bwt.2bit.64, reference.fa.pac

Basic Alignment

Paired-end reads

bwa-mem2 mem -t 8 reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

Single-end reads

bwa-mem2 mem -t 8 reference.fa reads.fq.gz > aligned.sam

Alignment with Read Groups

Add read group information (required for GATK)

bwa-mem2 mem -t 8
-R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA\tLB:lib1'
reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

Direct to Sorted BAM

Pipe to samtools for sorted BAM output

bwa-mem2 mem -t 8
-R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA'
reference.fa reads_1.fq.gz reads_2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -

Index the BAM

samtools index aligned.sorted.bam

Mark Duplicates Pipeline

Full pipeline: align, fixmate, sort, markdup

bwa-mem2 mem -t 8 -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA'
reference.fa reads_1.fq.gz reads_2.fq.gz |
samtools fixmate -m -@ 4 - - |
samtools sort -@ 4 - |
samtools markdup -@ 4 - aligned.markdup.bam

samtools index aligned.markdup.bam

Common Options

bwa-mem2 mem -t 8 \ # Threads -M \ # Mark shorter split hits as secondary (Picard compatible) -Y \ # Use soft clipping for supplementary alignments -K 100000000 \ # Process INT input bases in each batch -R '@RG\tID:s1\tSM:s1' \ # Read group reference.fa r1.fq r2.fq

Key Parameters

Parameter Default Description

-t 1 Number of threads

-k 19 Minimum seed length

-w 100 Band width for extension

-r 1.5 Re-seeding trigger ratio

-c 500 Skip seeds with more than INT hits

-A 1 Match score

-B 4 Mismatch penalty

-O 6 Gap open penalty

-E 1 Gap extension penalty

-M off Mark secondary alignments

Output Filters

Filter unmapped and low quality

bwa-mem2 mem -t 8 reference.fa r1.fq r2.fq |
samtools view -@ 4 -bS -q 20 -F 4 - |
samtools sort -@ 4 -o aligned.filtered.bam -

Split Read Alignment

For SV detection, use -Y for soft clipping

bwa-mem2 mem -t 8 -Y reference.fa r1.fq r2.fq > aligned.sam

Memory Requirements

  • Index loading: ~10GB for human genome

  • Per thread: ~1-2GB

  • Typical human WGS: 30-50GB RAM with 8 threads

BWA-MEM (Alternative)

Build index

bwa index reference.fa

Paired-end alignment

bwa mem -t 8 reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

With read groups

bwa mem -t 8 -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA'
reference.fa reads_1.fq.gz reads_2.fq.gz > aligned.sam

Direct to sorted BAM

bwa mem -t 8 -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA'
reference.fa reads_1.fq.gz reads_2.fq.gz |
samtools sort -@ 4 -o aligned.sorted.bam -

BWA-MEM vs BWA-MEM2

Feature BWA-MEM BWA-MEM2

Status Active Archived

Speed 1x 2-3x faster

Index format .bwt .bwt.2bit.64

Results Baseline Nearly identical

Memory ~5GB ~10GB

Related Skills

  • read-qc/fastp-workflow - Preprocess reads before alignment

  • alignment-files/alignment-sorting - Post-alignment processing

  • alignment-files/duplicate-handling - Mark duplicates

  • variant-calling/variant-calling - Call variants from BAM

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review