bio-bam

Toolkit for BAM/SAM/CRAM alignment file analysis: extract reads, identify indels, and calculate coverage. Designed for WGS/WES sequencing result inspection and quality control.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-bam" with this command: npx skills add dakesan/cc-dnawork-plugin/dakesan-cc-dnawork-plugin-bio-bam

BAM Toolkit

Toolkit for BAM/SAM/CRAM alignment file analysis: extract reads, identify indels, and calculate coverage. Designed for WGS/WES sequencing result inspection and quality control.

Quick Start

Install

uv pip install pysam typer

Basic Usage

1. 特定領域のリードを抽出

python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000 --output reads.json --format json

2. Indel を抽出

python scripts/extract_indels.py --bam alignment.bam --region chr1:1000-2000 --output indels.json

3. カバレッジ統計を計算

python scripts/calculate_coverage.py --bam alignment.bam --region chr1:1000-2000 --output coverage.json

Scripts

extract_reads.py - Read Extraction

Extract reads from BAM/SAM/CRAM files for specific genomic regions and export as BAM or JSON format.

Required Arguments

  • --bam PATH

  • Input BAM/SAM/CRAM file path

  • --region TEXT

  • Genomic region (e.g., chr1:1000-2000 )

Optional Arguments

Output:

  • --output PATH

  • Output file path (BAM or JSON based on extension)

  • --format TEXT

  • Output format: bam or json (default: bam )

Filters:

  • --min-mapq INT

  • Minimum mapping quality (default: 0)

  • --proper-pairs

  • Only properly paired reads

  • --no-duplicates

  • Exclude duplicate reads

Output Format (JSON)

{ "region": "chr1:1000-2000", "total_reads": 150, "filtered_from": 200, "filters": { "min_mapq": 30, "proper_pairs_only": true, "no_duplicates": true }, "reads": [ { "query_name": "read1", "reference_start": 1050, "reference_end": 1150, "sequence": "ATCG...", "quality": "IIII...", "mapping_quality": 60, "is_reverse": false, "cigar": "100M", "is_proper_pair": true, "is_duplicate": false, "is_paired": true, "mate_is_reverse": true, "mate_reference_name": "chr1", "mate_reference_start": 1200, "template_length": 250 } ] }

Output Format (BAM)

When --format bam is specified, outputs a filtered BAM file containing only reads that pass the specified filters.

Usage Examples

Extract reads as JSON

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--output reads.json
--format json

Extract reads as BAM

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--output reads.bam

Extract high-quality properly-paired reads

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--proper-pairs
--no-duplicates
--output filtered.bam

extract_indels.py - Insertion/Deletion Extraction

Extract insertions and deletions from reads in specified genomic regions by parsing CIGAR strings.

Required Arguments

  • --bam PATH

  • Input BAM/SAM/CRAM file path

  • --region TEXT

  • Genomic region (e.g., chr1:1000-2000 )

Optional Arguments

Output:

  • --output PATH
  • JSON output file path (default: stdout)

Filters:

  • --min-mapq INT

  • Minimum mapping quality (default: 20)

  • --min-indel-size INT

  • Minimum indel size in bp (default: 1)

Output Format (JSON)

{ "region": "chr1:1000-2000", "filters": { "min_mapq": 20, "min_indel_size": 1 }, "reads": { "total": 500, "processed": 450 }, "summary": { "total_insertions": 15, "total_deletions": 8, "unique_insertions": 5, "unique_deletions": 3 }, "insertions": [ { "position": 1050, "size": 3, "sequence": "ATG", "read_name": "read1", "mapping_quality": 60, "count": 3, "supporting_reads": ["read1", "read2", "read3"] } ], "deletions": [ { "position": 1100, "size": 2, "read_name": "read2", "mapping_quality": 55, "count": 2, "supporting_reads": ["read2", "read5"] } ] }

Implementation Details

  • Parses CIGAR strings to identify I (insertion) and D (deletion) operations

  • Extracts sequences for insertions from read sequences

  • Groups identical indels by position and sequence/size

  • Counts supporting reads for each unique indel

  • Filters by mapping quality and minimum indel size

Usage Examples

Extract all indels

python scripts/extract_indels.py
--bam alignment.bam
--region chr1:1000-2000
--output indels.json

Extract large indels only (>= 5bp)

python scripts/extract_indels.py
--bam alignment.bam
--region chr1:1000-2000
--min-indel-size 5
--output large_indels.json

High-quality indels only

python scripts/extract_indels.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--output hq_indels.json

calculate_coverage.py - Coverage Calculation

Calculate coverage statistics for specified genomic regions using pileup operations.

Required Arguments

  • --bam PATH

  • Input BAM/SAM/CRAM file path

  • --region TEXT

  • Genomic region (e.g., chr1:1000-2000 )

Optional Arguments

Output:

  • --output PATH
  • JSON output file path (default: stdout)

Options:

  • --min-mapq INT

  • Minimum mapping quality (default: 0)

  • --min-baseq INT

  • Minimum base quality (default: 0)

Output Format (JSON)

{ "region": "chr1:1000-2000", "filters": { "min_mapq": 0, "min_baseq": 0 }, "statistics": { "total_bases": 1000, "mean_coverage": 45.3, "median_coverage": 48, "min_coverage": 0, "max_coverage": 120, "bases_with_coverage": 995, "bases_without_coverage": 5, "percent_covered": 99.5 } }

Usage Examples

Calculate coverage statistics

python scripts/calculate_coverage.py
--bam alignment.bam
--region chr1:1000-2000
--output coverage.json

High-quality bases only

python scripts/calculate_coverage.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--min-baseq 20
--output hq_coverage.json

Workflow Examples

Example 1: Comprehensive Read Analysis Workflow

Combine all three scripts for complete BAM analysis:

Step 1: Calculate coverage statistics

python scripts/calculate_coverage.py
--bam alignment.bam
--region chr1:1000-2000
--output coverage.json

Step 2: Extract indels

python scripts/extract_indels.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--output indels.json

Step 3: Extract reads as JSON for detailed inspection

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--format json
--output reads.json

Example 2: High-Quality Indel Discovery

Identify large insertions and deletions with high-quality reads:

Extract large indels (>= 10bp) with high mapping quality

python scripts/extract_indels.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--min-indel-size 10
--output large_indels.json

Extract supporting reads for manual validation

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--output supporting_reads.bam

Example 3: Coverage Quality Control

Assess sequencing coverage quality across target region:

Calculate coverage statistics with quality filters

python scripts/calculate_coverage.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--min-baseq 20
--output coverage_stats.json

Extract reads from low-coverage regions for inspection

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1500-1600
--format json
--output low_coverage_reads.json

Error Handling

Missing BAM Index

$ python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000

Error fetching reads: random access requires an index Make sure the region 'chr1:1000-2000' exists and BAM file is indexed.

Solution: Create BAM index with samtools:

samtools index alignment.bam

Invalid Region Format

$ python scripts/extract_reads.py --bam alignment.bam --region 1000-2000

Error: Invalid region format: 1000-2000. Expected format: chr1:1000-2000

Solution: Use correct region format chr:start-end :

python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000

Best Practices

  1. Always Index BAM Files

Create BAM index before using any scripts to enable efficient random access:

✅ Good: Index BAM file first

samtools index alignment.bam

Then use bam-toolkit scripts

python scripts/extract_reads.py --bam alignment.bam --region chr1:1000-2000

  1. Apply Quality Filters for Reliable Results

Use mapping quality and base quality filters to ensure reliable analysis:

✅ Good: Apply quality filters

python scripts/extract_indels.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30

✅ Good: Use proper pairs for structural variant analysis

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--proper-pairs
--no-duplicates

  1. Extract Reads as BAM for Further Processing

Use BAM output when planning to use extracted reads with other tools:

✅ Good: Extract as BAM for use with samtools/IGV

python scripts/extract_reads.py
--bam alignment.bam
--region chr1:1000-2000
--min-mapq 30
--output filtered.bam

Then index and use with other tools

samtools index filtered.bam samtools view filtered.bam

When to Use bam-toolkit vs samtools

Task bam-toolkit samtools

Read extraction as JSON ✅ extract_reads.py

Indel extraction with grouping ✅ extract_indels.py

Coverage statistics as JSON ✅ calculate_coverage.py ✅ samtools depth

BAM-to-BAM filtering ✅ extract_reads.py ✅ samtools view

Read alignment statistics

✅ samtools flagstat

BAM indexing

✅ samtools index

Recommended Workflow:

  • Index BAM files with samtools

  • Use bam-toolkit scripts for structured JSON output

  • Use samtools for standard BAM operations (sorting, indexing, format conversion)

Related Skills

  • vcf-toolkit - VCF/BCF variant file operations

  • sequence-io - FASTA/FASTQ sequence file operations

  • blast-search - BLAST homology search

  • blat-api-searching - BLAT genome mapping

Troubleshooting

BAM File Not Readable

Ensure BAM file is properly formatted and readable:

Check BAM file integrity

samtools quickcheck alignment.bam

View BAM header

samtools view -H alignment.bam

Chromosome Name Mismatch

Chromosome names must match between BAM file and region specification:

Check chromosome names in BAM

samtools idxstats alignment.bam | cut -f1

Use correct chromosome name

If BAM uses "1" instead of "chr1":

python scripts/extract_reads.py --bam alignment.bam --region 1:1000-2000

Empty Output

If output is empty, check:

  • Region contains aligned reads: samtools view alignment.bam chr1:1000-2000 | head

  • Filters are not too restrictive (try reducing --min-mapq )

  • Chromosome name matches BAM file

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bio-blast

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-cosmic

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-vcf

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-fasta

No summary provided by upstream source.

Repository SourceNeeds Review