Multi-Omics Integration
Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation.
When to Use This Skill
- User has multiple omics datasets (RNA-seq + proteomics, methylation + expression, etc.)
- Cross-omics correlation queries (e.g., "How does methylation affect expression?")
- Multi-omics biomarker discovery or patient subtyping
- Systems biology questions requiring multiple molecular layers
- Precision medicine applications with multi-omics patient data
Workflow Overview
Phase 1: Data Loading & QC
Load each omics type, format-specific QC, normalize
Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics
Phase 2: Sample Matching
Harmonize sample IDs, find common samples, handle missing omics
Phase 3: Feature Mapping
Map features to common gene-level identifiers
CpG->gene (promoter), CNV->gene, metabolite->enzyme
Phase 4: Cross-Omics Correlation
RNA vs Protein (translation efficiency)
Methylation vs Expression (epigenetic regulation)
CNV vs Expression (dosage effect)
eQTL variants vs Expression (genetic regulation)
Phase 5: Multi-Omics Clustering
MOFA+, NMF, SNF for patient subtyping
Phase 6: Pathway-Level Integration
Aggregate omics evidence at pathway level
Score pathway dysregulation with combined evidence
Phase 7: Biomarker Discovery
Feature selection across omics, multi-omics classification
Phase 8: Integrated Report
Summary, correlations, clusters, pathways, biomarkers
See: phase_details.md for complete code and implementation details.
Supported Data Types
| Omics | Formats | QC Focus |
|---|---|---|
| Transcriptomics | CSV/TSV, HDF5, h5ad | Low-count filter, normalize (TPM/DESeq2), log-transform |
| Proteomics | MaxQuant, Spectronaut, DIA-NN | Missing value imputation, median/quantile normalization |
| Methylation | IDAT, beta matrices | Failed probes, batch correction, cross-reactive filter |
| Genomics | VCF, SEG (CNV) | Variant QC, CNV segmentation |
| Metabolomics | Peak tables | Missing values, normalization |
Core Operations
Sample Matching
def match_samples_across_omics(omics_data_dict):
"""Match samples across multiple omics datasets."""
sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
common_samples = set.intersection(*sample_ids.values())
matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
return sorted(common_samples), matched_data
Cross-Omics Correlation
from scipy.stats import spearmanr, pearsonr
# RNA vs Protein: expect positive r ~ 0.4-0.6
# Methylation vs Expression: expect negative r (promoter repression)
# CNV vs Expression: expect positive r (dosage effect)
for gene in common_genes:
r, p = spearmanr(rna[gene], protein[gene])
Pathway Integration
# Score pathway dysregulation using combined evidence from all omics
# Aggregate per-gene evidence, then per-pathway
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
See: phase_details.md for full implementations of each operation.
Multi-Omics Clustering Methods
| Method | Description | Best For |
|---|---|---|
| MOFA+ | Latent factors explaining cross-omics variation | Identifying shared/omics-specific drivers |
| Joint NMF | Shared decomposition across omics | Patient subtype discovery |
| SNF | Similarity network fusion | Integrating heterogeneous data types |
ToolUniverse Skills Coordination
| Skill | Used For | Phase |
|---|---|---|
tooluniverse-rnaseq-deseq2 | RNA-seq analysis | 1, 4 |
tooluniverse-epigenomics | Methylation, ChIP-seq | 1, 4 |
tooluniverse-variant-analysis | CNV/SNV processing | 1, 3, 4 |
tooluniverse-protein-interactions | Protein network context | 6 |
tooluniverse-gene-enrichment | Pathway enrichment | 6 |
tooluniverse-expression-data-retrieval | Public data retrieval | 1 |
tooluniverse-target-research | Gene/protein annotation | 3, 8 |
Use Cases
Cancer Multi-Omics
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.
eQTL + Expression + Methylation
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).
Drug Response Multi-Omics
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.
See: phase_details.md "Use Cases" for detailed step-by-step workflows.
Quantified Minimums
| Component | Requirement |
|---|---|
| Omics types | At least 2 datasets |
| Common samples | At least 10 across omics |
| Cross-correlation | Pearson/Spearman computed |
| Clustering | At least one method (MOFA+, NMF, or SNF) |
| Pathway integration | Enrichment with multi-omics evidence scores |
| Report | Summary, correlations, clusters, pathways, biomarkers |
Limitations
- Sample size: n >= 20 recommended for integration
- Missing data: Pairwise integration if not all samples have all omics
- Batch effects: Different platforms require careful normalization
- Computational: Large datasets may require significant memory
- Interpretation: Results require domain expertise for validation
References
- MOFA+: https://doi.org/10.1186/s13059-020-02015-1
- Similarity Network Fusion: https://doi.org/10.1038/nmeth.2810
- Multi-omics review: https://doi.org/10.1038/s41576-019-0093-7
- See individual ToolUniverse skill documentation for omics-specific methods
Detailed Reference
- phase_details.md - Complete code for all phases, correlation functions, clustering, pathway integration, biomarker discovery, report template, and detailed use cases