bio-workflows-microbiome-pipeline

Paired-End FASTQ (16S V4) │ ▼ ┌──────────────────────────────────────────────────┐ │ microbiome-pipeline │ ├──────────────────────────────────────────────────┤ │ 1. Quality Filtering (DADA2 filterAndTrim) │ │ 2. Error Learning & Denoising │ │ 3. Merge Pairs & Remove Chimeras │ │ 4. Taxonomy Assignment (SILVA) │ │ 5. Create phyloseq Object │ │ 6. Alpha/Beta Diversity │ │ 7. Differential Abundance (ALDEx2) │ │ 8. Visualization & Export │ └──────────────────────────────────────────────────┘ │ ▼ ASV Table + Taxonomy + Diversity Plots + Differential Taxa

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "bio-workflows-microbiome-pipeline" with this command: npx skills add gptomics/bioskills/gptomics-bioskills-bio-workflows-microbiome-pipeline

Microbiome Pipeline

Pipeline Overview

Paired-End FASTQ (16S V4) │ ▼ ┌──────────────────────────────────────────────────┐ │ microbiome-pipeline │ ├──────────────────────────────────────────────────┤ │ 1. Quality Filtering (DADA2 filterAndTrim) │ │ 2. Error Learning & Denoising │ │ 3. Merge Pairs & Remove Chimeras │ │ 4. Taxonomy Assignment (SILVA) │ │ 5. Create phyloseq Object │ │ 6. Alpha/Beta Diversity │ │ 7. Differential Abundance (ALDEx2) │ │ 8. Visualization & Export │ └──────────────────────────────────────────────────┘ │ ▼ ASV Table + Taxonomy + Diversity Plots + Differential Taxa

Complete R Workflow

library(dada2) library(phyloseq) library(ALDEx2) library(vegan) library(ggplot2)

=== CONFIGURATION ===

path <- 'raw_reads' silva_train <- 'silva_nr99_v138.1_train_set.fa.gz' silva_species <- 'silva_species_assignment_v138.1.fa.gz' metadata_file <- 'sample_metadata.csv'

=== 1. READ FILES ===

fnFs <- sort(list.files(path, pattern = '_R1_001.fastq.gz', full.names = TRUE)) fnRs <- sort(list.files(path, pattern = 'R2_001.fastq.gz', full.names = TRUE)) sample_names <- sapply(strsplit(basename(fnFs), ''), [, 1)

Setup filtered files

filtFs <- file.path('filtered', paste0(sample_names, '_F_filt.fastq.gz')) filtRs <- file.path('filtered', paste0(sample_names, '_R_filt.fastq.gz'))

=== 2. FILTER & TRIM ===

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen = c(240, 160), maxN = 0, maxEE = c(2, 2), truncQ = 2, rm.phix = TRUE, compress = TRUE, multithread = TRUE)

=== 3. LEARN ERRORS & DENOISE ===

errF <- learnErrors(filtFs, multithread = TRUE) errR <- learnErrors(filtRs, multithread = TRUE) dadaFs <- dada(filtFs, err = errF, multithread = TRUE) dadaRs <- dada(filtRs, err = errR, multithread = TRUE)

=== 4. MERGE & CHIMERAS ===

mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, verbose = TRUE) seqtab <- makeSequenceTable(mergers) seqtab_nochim <- removeBimeraDenovo(seqtab, method = 'consensus', multithread = TRUE)

=== 5. ASSIGN TAXONOMY ===

taxa <- assignTaxonomy(seqtab_nochim, silva_train, multithread = TRUE) taxa <- addSpecies(taxa, silva_species)

=== 6. BUILD PHYLOGENETIC TREE (for UniFrac) ===

library(DECIPHER) library(phangorn)

seqs <- getSequences(seqtab_nochim) names(seqs) <- paste0('ASV', seq_along(seqs)) alignment <- AlignSeqs(DNAStringSet(seqs), anchor = NA, processors = NULL) phang_align <- phyDat(as(alignment, 'matrix'), type = 'DNA') dm <- dist.ml(phang_align) tree <- NJ(dm) tree <- midpoint(ladderize(tree))

=== 7. CREATE PHYLOSEQ ===

metadata <- read.csv(metadata_file, row.names = 1) ps <- phyloseq(otu_table(seqtab_nochim, taxa_are_rows = FALSE), tax_table(taxa), sample_data(metadata), phy_tree(tree)) taxa_names(ps) <- paste0('ASV', seq(ntaxa(ps)))

=== 8. DIVERSITY ===

Alpha diversity (including Faith's PD with tree)

library(picante) alpha_div <- estimate_richness(ps, measures = c('Observed', 'Shannon', 'Simpson')) faith_pd <- pd(t(otu_table(ps)), phy_tree(ps), include.root = TRUE) alpha_div$PD <- faith_pd$PD alpha_div$Group <- sample_data(ps)$Group

Beta diversity (Bray-Curtis and UniFrac)

bray_dist <- phyloseq::distance(ps, method = 'bray') unifrac_dist <- UniFrac(ps, weighted = TRUE) pcoa_bray <- ordinate(ps, method = 'PCoA', distance = bray_dist) pcoa_unifrac <- ordinate(ps, method = 'PCoA', distance = unifrac_dist)

PERMANOVA on both metrics

meta_df <- data.frame(sample_data(ps)) permanova_bray <- adonis2(bray_dist ~ Group, data = meta_df, permutations = 999) permanova_unifrac <- adonis2(unifrac_dist ~ Group, data = meta_df, permutations = 999)

=== 9. DIFFERENTIAL ABUNDANCE ===

Filter low-abundance taxa

ps_filt <- filter_taxa(ps, function(x) sum(x > 0) > 0.1 * nsamples(ps), TRUE)

ALDEx2

otu <- as.data.frame(t(otu_table(ps_filt))) groups <- as.character(sample_data(ps_filt)$Group) aldex_results <- aldex(otu, groups, mc.samples = 128, test = 'welch', effect = TRUE) aldex_results$significant <- aldex_results$we.eBH < 0.05 & abs(aldex_results$effect) > 1

=== 10. OUTPUT ===

cat('Pipeline complete!\n') cat(' ASVs:', ntaxa(ps), '\n') cat(' Samples:', nsamples(ps), '\n') cat(' PERMANOVA R2:', round(permanova$R2[1], 3), 'p =', permanova$Pr(>F)[1], '\n') cat(' Differential taxa:', sum(aldex_results$significant), '\n')

QC Checkpoints

Stage Check Expected Action if Failed

Filter

70% reads pass 70% Adjust truncLen/maxEE

Merge

80% pairs merge 80% Check amplicon length

Chimera <25% chimeras <25% Check PCR cycles

Taxonomy

80% genus assigned 80% Try different database

Rarefaction Curves plateau Plateau Increase depth

PERMANOVA p < 0.05 p < 0.05 Check experimental design

Output Files

microbiome_results/ ├── phyloseq_object.rds # Complete phyloseq ├── asv_table.csv # ASV counts ├── taxonomy.csv # Taxonomic assignments ├── alpha_diversity.csv # Per-sample metrics ├── aldex2_results.csv # Differential taxa ├── read_tracking.csv # Reads per pipeline stage ├── plots/ │ ├── quality_profiles.pdf │ ├── alpha_diversity.pdf │ ├── beta_diversity_pcoa.pdf │ ├── taxonomic_barplot.pdf │ └── aldex2_effect_plot.pdf

Workflow Variants

ITS Fungal Workflow

Key differences for ITS:

1. No truncLen (variable length amplicons)

out <- filterAndTrim(fnFs, filtFs, fnRs, filtRs, maxN = 0, maxEE = c(2, 2), truncQ = 2, minLen = 50, rm.phix = TRUE, multithread = TRUE)

2. Use UNITE database

taxa <- assignTaxonomy(seqtab_nochim, 'sh_general_release_dynamic_25.07.2023.fasta', multithread = TRUE)

Different 16S Regions

V3-V4 (~460bp): truncLen = c(280, 200)

V4 (~253bp): truncLen = c(240, 160)

V1-V3 (~500bp): truncLen = c(260, 220)

GTDB Taxonomy

For environmental samples, GTDB may be more accurate

taxa <- assignTaxonomy(seqtab_nochim, 'GTDB_bac120_arc53_ssu_r214_fullTaxo.fa.gz', multithread = TRUE)

Related Skills

  • microbiome/amplicon-processing - DADA2 details

  • microbiome/taxonomy-assignment - Database options, IDTAXA

  • microbiome/diversity-analysis - Diversity metrics, Faith's PD

  • microbiome/differential-abundance - ALDEx2, ANCOM-BC2

  • microbiome/functional-prediction - PICRUSt2 functional analysis

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

bio-read-qc-fastp-workflow

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflow-management-cwl-workflows

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-scrnaseq-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bio-workflows-genome-assembly-pipeline

No summary provided by upstream source.

Repository SourceNeeds Review