bio-alignment-filtering

Filter alignments by flags, quality, and regions using samtools and pysam.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-alignment-filtering" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-alignment-filtering

Alignment Filtering

Filter alignments by flags, quality, and regions using samtools and pysam.

Filter Flags

Option Description

-f FLAG

Include reads with ALL bits set

-F FLAG

Exclude reads with ANY bits set

-G FLAG

Exclude reads with ALL bits set

-q MAPQ

Minimum mapping quality

-L BED

Include reads overlapping regions

Common FLAG Values

Flag Hex Meaning

1 0x1 Paired

2 0x2 Proper pair

4 0x4 Unmapped

8 0x8 Mate unmapped

16 0x10 Reverse strand

32 0x20 Mate reverse strand

64 0x40 First in pair (read1)

128 0x80 Second in pair (read2)

256 0x100 Secondary alignment

512 0x200 Failed QC

1024 0x400 Duplicate

2048 0x800 Supplementary

Filter by FLAG

Keep Only Mapped Reads

samtools view -F 4 -o mapped.bam input.bam

Keep Only Unmapped Reads

samtools view -f 4 -o unmapped.bam input.bam

Keep Only Properly Paired

samtools view -f 2 -o proper.bam input.bam

Remove Duplicates

samtools view -F 1024 -o nodup.bam input.bam

Remove Secondary and Supplementary

samtools view -F 2304 -o primary.bam input.bam

Keep Only Primary Alignments

samtools view -F 256 -F 2048 -o primary.bam input.bam

Or combined: -F 2304

Keep Read1 Only

samtools view -f 64 -o read1.bam input.bam

Keep Read2 Only

samtools view -f 128 -o read2.bam input.bam

Forward Strand Only

samtools view -F 16 -o forward.bam input.bam

Reverse Strand Only

samtools view -f 16 -o reverse.bam input.bam

Filter by Mapping Quality

Minimum MAPQ

samtools view -q 30 -o highqual.bam input.bam

MAPQ and Mapped

samtools view -F 4 -q 30 -o filtered.bam input.bam

Common MAPQ Thresholds

MAPQ Meaning

0 Mapped to multiple locations equally well

20 ~1% chance of wrong mapping

30 ~0.1% chance of wrong mapping

40 ~0.01% chance of wrong mapping

60 Unique mapping (BWA max)

Filter by Region

Single Region

samtools view -o region.bam input.bam chr1:1000000-2000000

Multiple Regions

samtools view -o regions.bam input.bam chr1:1000-2000 chr2:3000-4000

Regions from BED File

samtools view -L targets.bed -o targets.bam input.bam

Combine Region and Quality

samtools view -q 30 -L targets.bed -o filtered.bam input.bam

Combined Filters

Standard Quality Filter

Primary, mapped, non-duplicate, MAPQ >= 30

samtools view -F 3332 -q 30 -o filtered.bam input.bam

3332 = 4 (unmapped) + 256 (secondary) + 1024 (duplicate) + 2048 (supplementary)

Variant Calling Prep

Properly paired, primary, no duplicates, MAPQ >= 20

samtools view -f 2 -F 3328 -q 20 -o clean.bam input.bam

3328 = 256 (secondary) + 1024 (duplicate) + 2048 (supplementary)

Note: -f 2 (proper pair) implies mapped, so -F 4 is not strictly needed

ChIP-seq Filter

Remove duplicates and low MAPQ

samtools view -F 1024 -q 30 -o filtered.bam input.bam

Subsample Reads

Random Subsample

Keep ~10% of reads

samtools view -s 0.1 -o subset.bam input.bam

With seed for reproducibility

samtools view -s 42.1 -o subset.bam input.bam

Subsample to Target Count

Calculate fraction needed

total=$(samtools view -c input.bam) frac=$(echo "scale=4; 1000000 / $total" | bc) samtools view -s "$frac" -o subset.bam input.bam

pysam Python Alternative

Basic Filtering

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as infile: with pysam.AlignmentFile('filtered.bam', 'wb', header=infile.header) as outfile: for read in infile: if read.is_unmapped: continue if read.mapping_quality < 30: continue if read.is_duplicate: continue outfile.write(read)

Filter with Function

import pysam

def passes_filter(read): if read.is_unmapped: return False if read.is_secondary or read.is_supplementary: return False if read.is_duplicate: return False if read.mapping_quality < 30: return False return True

with pysam.AlignmentFile('input.bam', 'rb') as infile: with pysam.AlignmentFile('filtered.bam', 'wb', header=infile.header) as outfile: for read in infile: if passes_filter(read): outfile.write(read)

Filter by Region

import pysam

with pysam.AlignmentFile('input.bam', 'rb') as infile: with pysam.AlignmentFile('region.bam', 'wb', header=infile.header) as outfile: for read in infile.fetch('chr1', 1000000, 2000000): outfile.write(read)

Filter from BED File

import pysam

def read_bed(bed_path): regions = [] with open(bed_path) as f: for line in f: if line.startswith('#'): continue parts = line.strip().split('\t') regions.append((parts[0], int(parts[1]), int(parts[2]))) return regions

regions = read_bed('targets.bed')

with pysam.AlignmentFile('input.bam', 'rb') as infile: with pysam.AlignmentFile('targets.bam', 'wb', header=infile.header) as outfile: for chrom, start, end in regions: for read in infile.fetch(chrom, start, end): outfile.write(read)

Subsample

import pysam import random

random.seed(42) fraction = 0.1

with pysam.AlignmentFile('input.bam', 'rb') as infile: with pysam.AlignmentFile('subset.bam', 'wb', header=infile.header) as outfile: for read in infile: if random.random() < fraction: outfile.write(read)

Quick Reference

Task samtools command

Mapped only view -F 4

Unmapped only view -f 4

Properly paired view -f 2

Primary only view -F 2304

No duplicates view -F 1024

High MAPQ view -q 30

Region view file.bam chr1:1-1000

BED regions view -L file.bed

Subsample 10% view -s 0.1

Standard filter view -F 3332 -q 30

Common Filter Combinations

Purpose Flags

Clean reads -F 3332 -q 30 (mapped, primary, no dups, high qual)

Variant calling -f 2 -F 3328 -q 20 (proper pair, primary, no dups)

Coverage analysis -F 1284 -q 1 (mapped, primary, no dups)

Count unique -F 2304 (primary only)

Flag breakdowns:

  • 2304 = 256 + 2048 (secondary + supplementary)

  • 3328 = 256 + 1024 + 2048 (secondary + duplicate + supplementary)

  • 3332 = 4 + 256 + 1024 + 2048 (unmapped + secondary + duplicate + supplementary)

  • 1284 = 4 + 256 + 1024 (unmapped + secondary + duplicate)

Related Skills

  • sam-bam-basics - View and understand alignment files

  • alignment-sorting - Sort before/after filtering

  • alignment-indexing - Required for region filtering

  • duplicate-handling - Mark duplicates before filtering

  • bam-statistics - Check filter effects

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-multipanel-figures

No summary provided by upstream source.

Repository SourceNeeds Review