bio-long-read-sequencing-clair3-variants

Clair3 Variant Calling

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-long-read-sequencing-clair3-variants" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-long-read-sequencing-clair3-variants

Clair3 Variant Calling

Basic Usage

ONT variant calling

run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_output

PacBio HiFi variant calling

run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--threads=32
--platform=hifi
--model_path=${CONDA_PREFIX}/bin/models/hifi
--output=clair3_output

Output: clair3_output/merge_output.vcf.gz

Platform-Specific Models

Platform Model Recommended Coverage

ONT R10 r1041_e82_400bps_sup_v430 30-60x

ONT R9 r941_prom_sup_g5014 30-60x

PacBio HiFi hifi 20-40x

PacBio CLR

Use PEPPER-Margin-DeepVariant

List available models

ls ${CONDA_PREFIX}/bin/models/

Specify exact model

run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--model_path=${CONDA_PREFIX}/bin/models/r1041_e82_400bps_sup_v430
--output=clair3_out
--threads=32

Key Parameters

Parameter Description

--platform ont, hifi, or ilmn

--model_path Path to trained model

--bed_fn Restrict calling to regions

--include_all_ctgs Call on all contigs (not just chr1-22,X,Y)

--no_phasing_for_fa Disable phasing

--gvcf Output gVCF format

--qual Minimum variant quality (default: 2)

Region-Specific Calling

Call variants in specific regions

run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--bed_fn=target_regions.bed
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_targeted

Call on non-human genomes (all contigs)

run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--include_all_ctgs
--threads=32
--platform=hifi
--model_path=${CONDA_PREFIX}/bin/models/hifi
--output=clair3_all_contigs

gVCF Output

Generate gVCF for joint calling

run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--gvcf
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_gvcf

Joint genotyping multiple samples

bcftools merge sample1.g.vcf.gz sample2.g.vcf.gz -o cohort.vcf.gz

Phased Variant Calling

With phasing information (requires haplotagged BAM)

run_clair3.sh
--bam_fn=haplotagged.bam
--ref_fn=reference.fasta
--enable_phasing
--longphase_for_phasing
--threads=32
--platform=ont
--model_path=${CONDA_PREFIX}/bin/models/ont
--output=clair3_phased

Quality Filtering

Filter by quality score

bcftools view -i 'QUAL>20' clair3_output/merge_output.vcf.gz -Oz -o filtered.vcf.gz

Filter by genotype quality

bcftools view -i 'GQ>30' clair3_output/merge_output.vcf.gz -Oz -o high_gq.vcf.gz

SNPs only

bcftools view -v snps clair3_output/merge_output.vcf.gz -Oz -o snps.vcf.gz

Indels only

bcftools view -v indels clair3_output/merge_output.vcf.gz -Oz -o indels.vcf.gz

Python Wrapper

import subprocess from pathlib import Path

def run_clair3(bam, reference, output_dir, platform='ont', model_path=None, threads=32, bed=None, gvcf=False, include_all_ctgs=False): if model_path is None: import os conda_prefix = os.environ.get('CONDA_PREFIX', '') model_path = f'{conda_prefix}/bin/models/{platform}'

cmd = [
    'run_clair3.sh',
    f'--bam_fn={bam}',
    f'--ref_fn={reference}',
    f'--threads={threads}',
    f'--platform={platform}',
    f'--model_path={model_path}',
    f'--output={output_dir}'
]

if bed:
    cmd.append(f'--bed_fn={bed}')
if gvcf:
    cmd.append('--gvcf')
if include_all_ctgs:
    cmd.append('--include_all_ctgs')

subprocess.run(cmd, check=True)
return Path(output_dir) / 'merge_output.vcf.gz'

def filter_variants(vcf, output, min_qual=20, variant_type=None): cmd = ['bcftools', 'view', '-i', f'QUAL>{min_qual}'] if variant_type: cmd.extend(['-v', variant_type]) cmd.extend([vcf, '-Oz', '-o', output]) subprocess.run(cmd, check=True) subprocess.run(['bcftools', 'index', '-t', output], check=True) return output

Example

vcf = run_clair3('sample.bam', 'ref.fa', 'clair3_out', platform='hifi', threads=48) snps = filter_variants(str(vcf), 'snps_q20.vcf.gz', min_qual=20, variant_type='snps')

Comparison with Other Callers

Caller Best For Speed Accuracy

Clair3 ONT/HiFi germline Fast High

DeepVariant HiFi, Illumina Medium Very high

PEPPER-DV ONT (integrated) Slow Very high

Longshot ONT SNPs Fast Good

Troubleshooting

Issue Solution

Missing model Download from Clair3 releases or use conda models

Low call rate Check coverage; increase --qual threshold

Slow performance Reduce --threads or use --bed_fn for targeted calling

Wrong variants on non-human Use --include_all_ctgs

Docker Usage

Using Docker

docker run -v /data:/data
hkubal/clair3:latest
/opt/bin/run_clair3.sh
--bam_fn=/data/sample.bam
--ref_fn=/data/reference.fasta
--threads=32
--platform=ont
--model_path=/opt/models/ont
--output=/data/clair3_output

Singularity

singularity exec clair3.sif run_clair3.sh
--bam_fn=sample.bam
--ref_fn=reference.fasta
--threads=32
--platform=ont
--model_path=/opt/models/ont
--output=clair3_output

Related Skills

  • variant-calling/bcftools-basics - VCF manipulation

  • variant-calling/filtering-best-practices - Quality filtering

  • long-read-sequencing/long-read-qc - Input quality control

  • long-read-sequencing/long-read-alignment - Mapping with minimap2

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bioskills

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-data-visualization-genome-tracks

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-epitranscriptomics-merip-preprocessing

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-metagenomics-kraken

No summary provided by upstream source.

Repository SourceNeeds Review