scVelo — RNA Velocity Analysis
Overview
scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
Installation: pip install scvelo
Key resources:
-
Documentation: https://scvelo.readthedocs.io/
-
Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759
When to Use This Skill
Use scVelo when:
-
Trajectory inference from snapshot data: Determine which direction cells are differentiating
-
Cell fate prediction: Identify progenitor cells and their downstream fates
-
Driver gene identification: Find genes whose dynamics best explain observed trajectories
-
Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
-
Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
-
Complement to Scanpy: Add directional information to UMAP embeddings
Prerequisites
scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:
-
STARsolo or kallisto|bustools with lamanno mode
-
velocyto CLI: velocyto run10x / velocyto run
-
alevin-fry / simpleaf with spliced/unspliced output
Data is stored in an AnnData object with layers["spliced"] and layers["unspliced"] .
Standard RNA Velocity Workflow
- Setup and Data Loading
import scvelo as scv import scanpy as sc import numpy as np import matplotlib.pyplot as plt
Configure settings
scv.settings.verbosity = 3 # Show computation steps scv.settings.presenter_view = True scv.settings.set_figure_params('scvelo')
Load data (AnnData with spliced/unspliced layers)
Option A: Load from loom (velocyto output)
adata = scv.read("cellranger_output.loom", cache=True)
Option B: Merge velocyto loom with Scanpy-processed AnnData
adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters adata_velocity = scv.read("velocyto.loom") adata = scv.utils.merge(adata_processed, adata_velocity)
Verify layers
print(adata)
obs × var: N × G
layers: 'spliced', 'unspliced' (required)
obsm['X_umap'] (required for visualization)
- Preprocessing
Filter and normalize (follows Scanpy conventions)
scv.pp.filter_and_normalize( adata, min_shared_counts=20, # Minimum counts in spliced+unspliced n_top_genes=2000 # Top highly variable genes )
Compute first and second order moments (means and variances)
knn_connectivities must be computed first
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30) scv.pp.moments( adata, n_pcs=30, n_neighbors=30 )
- Velocity Estimation — Stochastic Model
The stochastic model is fast and suitable for exploratory analysis:
Stochastic velocity (faster, less accurate)
scv.tl.velocity(adata, mode='stochastic') scv.tl.velocity_graph(adata)
Visualize
scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', title="RNA Velocity (Stochastic)" )
- Velocity Estimation — Dynamical Model (Recommended)
The dynamical model fits the full splicing kinetics and is more accurate:
Recover dynamics (computationally intensive; ~10-30 min for 10K cells)
scv.tl.recover_dynamics(adata, n_jobs=4)
Compute velocity from dynamical model
scv.tl.velocity(adata, mode='dynamical') scv.tl.velocity_graph(adata)
- Latent Time
The dynamical model enables computation of a shared latent time (pseudotime):
Compute latent time
scv.tl.latent_time(adata)
Visualize latent time on UMAP
scv.pl.scatter( adata, color='latent_time', color_map='gnuplot', size=80, title='Latent time' )
Identify top genes ordered by latent time
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300] scv.pl.heatmap( adata, var_names=top_genes, sortby='latent_time', col_color='leiden', n_convolve=100 )
- Driver Gene Analysis
Identify genes with highest velocity fit
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3) df = scv.DataFrame(adata.uns['rank_velocity_genes']['names']) print(df.head(10))
Speed and coherence
scv.tl.velocity_confidence(adata) scv.pl.scatter( adata, c=['velocity_length', 'velocity_confidence'], cmap='coolwarm', perc=[5, 95] )
Phase portraits for specific genes
scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'], ncols=3, figsize=(16, 4))
- Velocity Arrows and Pseudotime
Arrow plot on UMAP
scv.pl.velocity_embedding( adata, arrow_length=3, arrow_size=2, color='leiden', basis='umap' )
Stream plot (cleaner visualization)
scv.pl.velocity_embedding_stream( adata, basis='umap', color='leiden', smooth=0.8, min_mass=4 )
Velocity pseudotime (alternative to latent time)
scv.tl.velocity_pseudotime(adata) scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
- PAGA Trajectory Graph
PAGA graph with velocity-informed transitions
scv.tl.paga(adata, groups='leiden') df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T df.style.background_gradient(cmap='Blues').format('{:.2g}')
Plot PAGA with velocity
scv.pl.paga( adata, basis='umap', size=50, alpha=0.1, min_edge_width=2, node_size_scale=1.5 )
Complete Workflow Script
import scvelo as scv import scanpy as sc
def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4): """ Complete RNA velocity workflow.
Args:
adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
n_top_genes: Number of top HVGs for velocity
mode: 'stochastic' (fast) or 'dynamical' (accurate)
n_jobs: Parallel jobs for dynamical model
Returns:
Processed AnnData with velocity information
"""
scv.settings.verbosity = 2
# 1. Preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
# 2. Velocity estimation
if mode == 'dynamical':
scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
scv.tl.velocity(adata, mode=mode)
scv.tl.velocity_graph(adata)
# 3. Downstream analyses
if mode == 'dynamical':
scv.tl.latent_time(adata)
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
scv.tl.velocity_confidence(adata)
scv.tl.velocity_pseudotime(adata)
return adata
Key Output Fields in AnnData
After running the workflow, the following fields are added:
Location Key Description
adata.layers
velocity
RNA velocity per gene per cell
adata.layers
fit_t
Fitted latent time per gene per cell
adata.obsm
velocity_umap
2D velocity vectors on UMAP
adata.obs
velocity_pseudotime
Pseudotime from velocity
adata.obs
latent_time
Latent time from dynamical model
adata.obs
velocity_length
Speed of each cell
adata.obs
velocity_confidence
Confidence score per cell
adata.var
fit_likelihood
Gene-level model fit quality
adata.var
fit_alpha
Transcription rate
adata.var
fit_beta
Splicing rate
adata.var
fit_gamma
Degradation rate
adata.uns
velocity_graph
Cell-cell transition probability matrix
Velocity Models Comparison
Model Speed Accuracy When to Use
stochastic
Fast Moderate Exploratory; large datasets
deterministic
Medium Moderate Simple linear kinetics
dynamical
Slow High Publication-quality; identifies driver genes
Best Practices
-
Start with stochastic mode for exploration; switch to dynamical for final analysis
-
Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
-
Minimum 2,000 cells: RNA velocity is noisy with fewer cells
-
Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
-
k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
-
Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
-
Dynamical model requires distinct kinetic states: Works best for clear differentiation processes
Troubleshooting
Problem Solution
Missing unspliced layer Re-run velocyto or use STARsolo with --soloFeatures Gene Velocyto
Very few velocity genes Lower min_shared_counts ; check sequencing depth
Random-looking arrows Try different n_neighbors or velocity model
Memory error with dynamical Set n_jobs=1 ; reduce n_top_genes
Negative velocity everywhere Check that spliced/unspliced layers are not swapped
Additional Resources
-
scVelo documentation: https://scvelo.readthedocs.io/
-
Tutorial notebooks: https://scvelo.readthedocs.io/tutorials/
-
Paper: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
-
velocyto (preprocessing): http://velocyto.org/
-
CellRank (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
-
dynamo (metabolic labeling alternative): https://dynamo-release.readthedocs.io/