bio-fasta

Read, write, and manipulate biological sequence files (FASTA, GenBank, FASTQ).

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-fasta" with this command: npx skills add dakesan/cc-dnawork-plugin/dakesan-cc-dnawork-plugin-bio-fasta

Sequence I/O

Read, write, and manipulate biological sequence files (FASTA, GenBank, FASTQ).

When to Use This Skill

This skill should be used when:

  • Reading or writing sequence files (FASTA, GenBank, FASTQ)

  • Converting between sequence file formats

  • Manipulating sequences (complement, reverse complement, translate)

  • Extracting sequences from large indexed FASTA files (faidx)

  • Calculating sequence statistics (GC content, molecular weight, Tm)

When NOT to Use This Skill

  • NGS alignment files (SAM/BAM/VCF) → Use pysam

  • BLAST searches → Use gget (quick) or blat-integration (large-scale)

  • Multiple sequence alignment → Use msa-advanced

  • Phylogenetic analysis → Use etetoolkit

  • NCBI database queries → Use pubmed-database or gene-database

Tool Selection Guide

Task Tool Reference

Parse FASTA/GenBank/FASTQ Bio.SeqIO

biopython_seqio.md

Convert file formats Bio.SeqIO.convert()

biopython_seqio.md

Sequence operations Bio.Seq

biopython_seqio.md

Large FASTA random access pysam.FastaFile

  • faidx faidx.md

GC%, Tm, molecular weight Bio.SeqUtils

utilities.md

Quick Start

Installation

uv pip install biopython pysam

Read FASTA

from Bio import SeqIO

for record in SeqIO.parse("sequences.fasta", "fasta"): print(f"{record.id}: {len(record.seq)} bp")

Convert GenBank to FASTA

from Bio import SeqIO

SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")

Random Access with faidx

import pysam

Create index (once)

pysam.faidx("reference.fasta")

Random access

fasta = pysam.FastaFile("reference.fasta") seq = fasta.fetch("chr1", 1000, 2000) # 0-based coordinates fasta.close()

Sequence Operations

from Bio.Seq import Seq

seq = Seq("ATGCGATCGATCG") print(seq.complement()) print(seq.reverse_complement()) print(seq.translate())

Reference Documentation

Consult the appropriate reference file for detailed documentation:

references/biopython_seqio.md

  • Bio.Seq object and sequence operations

  • Bio.SeqIO for file parsing and writing

  • SeqRecord object and annotations

  • Supported file formats

  • Format conversion patterns

references/faidx.md

  • Creating FASTA index with pysam.faidx()

  • pysam.FastaFile for random access

  • Coordinate systems (0-based vs 1-based)

  • Performance considerations for large files

  • Common patterns (variant context, gene extraction)

references/utilities.md

  • GC content calculation (gc_fraction )

  • Molecular weight (molecular_weight )

  • Melting temperature (MeltingTemp )

  • Codon usage analysis

  • Restriction enzyme sites

references/formats.md

  • FASTA format specification

  • GenBank format specification

  • FASTQ format and quality scores

  • Format detection and validation

Coordinate Systems

Biopython: Uses Python-style 0-based, half-open intervals for slicing.

pysam.FastaFile.fetch():

  • Numeric arguments: 0-based (fetch("chr1", 999, 2000) = positions 999-1999)

  • Region strings: 1-based (fetch("chr1:1000-2000") = positions 1000-2000)

Common Pitfalls

  • Coordinate confusion: Remember which tool uses 0-based vs 1-based

  • Missing faidx index: Random access requires .fai file

  • Format mismatch: Verify file format matches the format string in SeqIO.parse()

  • Iterator exhaustion: SeqIO.parse() returns an iterator; convert to list if multiple passes needed

  • Large files: Use iterators, not list() , for memory efficiency

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

bio-vcf

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-blat

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-igv

No summary provided by upstream source.

Repository SourceNeeds Review
General

bio-cosmic

No summary provided by upstream source.

Repository SourceNeeds Review