scvi-tools Deep Learning Skill
This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.
How to Use This Skill
-
Identify the appropriate workflow from the model/workflow tables below
-
Read the corresponding reference file for detailed steps and code
-
Use scripts in scripts/ to avoid rewriting common code
-
For installation or GPU issues, consult references/environment_setup.md
-
For debugging, consult references/troubleshooting.md
When to Use This Skill
-
When scvi-tools, scVI, scANVI, or related models are mentioned
-
When deep learning-based batch correction or integration is needed
-
When working with multi-modal data (CITE-seq, multiome)
-
When reference mapping or label transfer is required
-
When analyzing ATAC-seq or spatial transcriptomics data
-
When learning latent representations of single-cell data
Model Selection Guide
Data Type Model Primary Use Case
scRNA-seq scVI Unsupervised integration, DE, imputation
scRNA-seq + labels scANVI Label transfer, semi-supervised integration
CITE-seq (RNA+protein) totalVI Multi-modal integration, protein denoising
scATAC-seq PeakVI Chromatin accessibility analysis
Multiome (RNA+ATAC) MultiVI Joint modality analysis
Spatial + scRNA reference DestVI Cell type deconvolution
RNA velocity veloVI Transcriptional dynamics
Cross-technology sysVI System-level batch correction
Workflow Reference Files
Workflow Reference File Description
Environment Setup references/environment_setup.md
Installation, GPU, version info
Data Preparation references/data_preparation.md
Formatting data for any model
scRNA Integration references/scrna_integration.md
scVI/scANVI batch correction
ATAC-seq Analysis references/atac_peakvi.md
PeakVI for accessibility
CITE-seq Analysis references/citeseq_totalvi.md
totalVI for protein+RNA
Multiome Analysis references/multiome_multivi.md
MultiVI for RNA+ATAC
Spatial Deconvolution references/spatial_deconvolution.md
DestVI spatial analysis
Label Transfer references/label_transfer.md
scANVI reference mapping
scArches Mapping references/scarches_mapping.md
Query-to-reference mapping
Batch Correction references/batch_correction_sysvi.md
Advanced batch methods
RNA Velocity references/rna_velocity_velovi.md
veloVI dynamics
Troubleshooting references/troubleshooting.md
Common issues and solutions
CLI Scripts
Modular scripts for common workflows. Chain together or modify as needed.
Pipeline Scripts
Script Purpose Usage
prepare_data.py
QC, filter, HVG selection python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch
train_model.py
Train any scvi-tools model python scripts/train_model.py prepared.h5ad results/ --model scvi
cluster_embed.py
Neighbors, UMAP, Leiden python scripts/cluster_embed.py adata.h5ad results/
differential_expression.py
DE analysis python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden
transfer_labels.py
Label transfer with scANVI python scripts/transfer_labels.py ref_model/ query.h5ad results/
integrate_datasets.py
Multi-dataset integration python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad
validate_adata.py
Check data compatibility python scripts/validate_adata.py data.h5ad --batch-key batch
Example Workflow
1. Validate input data
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest
2. Prepare data (QC, HVG selection)
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000
3. Train model
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch
4. Cluster and visualize
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8
5. Differential expression
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
Python Utilities
The scripts/model_utils.py provides importable functions for custom workflows:
Function Purpose
prepare_adata()
Data preparation (QC, HVG, layer setup)
train_scvi()
Train scVI or scANVI
evaluate_integration()
Compute integration metrics
get_marker_genes()
Extract DE markers
save_results()
Save model, data, plots
auto_select_model()
Suggest best model
quick_clustering()
Neighbors + UMAP + Leiden
Critical Requirements
Raw counts required: scvi-tools models require integer count data
adata.layers["counts"] = adata.X.copy() # Before normalization scvi.model.SCVI.setup_anndata(adata, layer="counts")
HVG selection: Use 2000-4000 highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3") adata = adata[:, adata.var['highly_variable']].copy()
Batch information: Specify batch_key for integration
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
Quick Decision Tree
Need to integrate scRNA-seq data? ├── Have cell type labels? → scANVI (references/label_transfer.md) └── No labels? → scVI (references/scrna_integration.md)
Have multi-modal data? ├── CITE-seq (RNA + protein)? → totalVI (references/citeseq_totalvi.md) ├── Multiome (RNA + ATAC)? → MultiVI (references/multiome_multivi.md) └── scATAC-seq only? → PeakVI (references/atac_peakvi.md)
Have spatial data? └── Need cell type deconvolution? → DestVI (references/spatial_deconvolution.md)
Have pre-trained reference model? └── Map query to reference? → scArches (references/scarches_mapping.md)
Need RNA velocity? └── veloVI (references/rna_velocity_velovi.md)
Strong cross-technology batch effects? └── sysVI (references/batch_correction_sysvi.md)
Key Resources
-
scvi-tools Documentation
-
scvi-tools Tutorials
-
Model Hub
-
GitHub Issues