LatchBio Integration
Overview
Latch is a Python framework for building and deploying bioinformatics workflows as serverless pipelines. Built on Flyte, create workflows with @workflow/@task decorators, manage cloud data with LatchFile/LatchDir, configure resources, and integrate Nextflow/Snakemake pipelines.
Core Capabilities
The Latch platform provides four main areas of functionality:
- Workflow Creation and Deployment
-
Define serverless workflows using Python decorators
-
Support for native Python, Nextflow, and Snakemake pipelines
-
Automatic containerization with Docker
-
Auto-generated no-code user interfaces
-
Version control and reproducibility
- Data Management
-
Cloud storage abstractions (LatchFile, LatchDir)
-
Structured data organization with Registry (Projects → Tables → Records)
-
Type-safe data operations with links and enums
-
Automatic file transfer between local and cloud
-
Glob pattern matching for file selection
- Resource Configuration
-
Pre-configured task decorators (@small_task, @large_task, @small_gpu_task, @large_gpu_task)
-
Custom resource specifications (CPU, memory, GPU, storage)
-
GPU support (K80, V100, A100)
-
Timeout and storage configuration
-
Cost optimization strategies
- Verified Workflows
-
Production-ready pre-built pipelines
-
Bulk RNA-seq, DESeq2, pathway analysis
-
AlphaFold and ColabFold for protein structure prediction
-
Single-cell tools (ArchR, scVelo, emptyDropsR)
-
CRISPR analysis, phylogenetics, and more
Quick Start
Installation and Setup
Install Latch SDK
python3 -m pip install latch
Login to Latch
latch login
Initialize a new workflow
latch init my-workflow
Register workflow to platform
latch register my-workflow
Prerequisites:
-
Docker installed and running
-
Latch account credentials
-
Python 3.8+
Basic Workflow Example
from latch import workflow, small_task from latch.types import LatchFile
@small_task def process_file(input_file: LatchFile) -> LatchFile: """Process a single file""" # Processing logic return output_file
@workflow def my_workflow(input_file: LatchFile) -> LatchFile: """ My bioinformatics workflow
Args:
input_file: Input data file
"""
return process_file(input_file=input_file)
When to Use This Skill
This skill should be used when encountering any of the following scenarios:
Workflow Development:
-
"Create a Latch workflow for RNA-seq analysis"
-
"Deploy my pipeline to Latch"
-
"Convert my Nextflow pipeline to Latch"
-
"Add GPU support to my workflow"
-
Working with @workflow , @task decorators
Data Management:
-
"Organize my sequencing data in Latch Registry"
-
"How do I use LatchFile and LatchDir?"
-
"Set up sample tracking in Latch"
-
Working with latch:/// paths
Resource Configuration:
-
"Configure GPU for AlphaFold on Latch"
-
"My task is running out of memory"
-
"How do I optimize workflow costs?"
-
Working with task decorators
Verified Workflows:
-
"Run AlphaFold on Latch"
-
"Use DESeq2 for differential expression"
-
"Available pre-built workflows"
-
Using latch.verified module
Detailed Documentation
This skill includes comprehensive reference documentation organized by capability:
references/workflow-creation.md
Read this for:
-
Creating and registering workflows
-
Task definition and decorators
-
Supporting Python, Nextflow, Snakemake
-
Launch plans and conditional sections
-
Workflow execution (CLI and programmatic)
-
Multi-step and parallel pipelines
-
Troubleshooting registration issues
Key topics:
-
latch init and latch register commands
-
@workflow and @task decorators
-
LatchFile and LatchDir basics
-
Type annotations and docstrings
-
Launch plans with preset parameters
-
Conditional UI sections
references/data-management.md
Read this for:
-
Cloud storage with LatchFile and LatchDir
-
Registry system (Projects, Tables, Records)
-
Linked records and relationships
-
Enum and typed columns
-
Bulk operations and transactions
-
Integration with workflows
-
Account and workspace management
Key topics:
-
latch:/// path format
-
File transfer and glob patterns
-
Creating and querying Registry tables
-
Column types (string, number, file, link, enum)
-
Record CRUD operations
-
Workflow-Registry integration
references/resource-configuration.md
Read this for:
-
Task resource decorators
-
Custom CPU, memory, GPU configuration
-
GPU types (K80, V100, A100)
-
Timeout and storage settings
-
Resource optimization strategies
-
Cost-effective workflow design
-
Monitoring and debugging
Key topics:
-
@small_task , @large_task , @small_gpu_task , @large_gpu_task
-
@custom_task with precise specifications
-
Multi-GPU configuration
-
Resource selection by workload type
-
Platform limits and quotas
references/verified-workflows.md
Read this for:
-
Pre-built production workflows
-
Bulk RNA-seq and DESeq2
-
AlphaFold and ColabFold
-
Single-cell analysis (ArchR, scVelo)
-
CRISPR editing analysis
-
Pathway enrichment
-
Integration with custom workflows
Key topics:
-
latch.verified module imports
-
Available verified workflows
-
Workflow parameters and options
-
Combining verified and custom steps
-
Version management
Common Workflow Patterns
Complete RNA-seq Pipeline
from latch import workflow, small_task, large_task from latch.types import LatchFile, LatchDir
@small_task def quality_control(fastq: LatchFile) -> LatchFile: """Run FastQC""" return qc_output
@large_task def alignment(fastq: LatchFile, genome: str) -> LatchFile: """STAR alignment""" return bam_output
@small_task def quantification(bam: LatchFile) -> LatchFile: """featureCounts""" return counts
@workflow def rnaseq_pipeline( input_fastq: LatchFile, genome: str, output_dir: LatchDir ) -> LatchFile: """RNA-seq analysis pipeline""" qc = quality_control(fastq=input_fastq) aligned = alignment(fastq=qc, genome=genome) return quantification(bam=aligned)
GPU-Accelerated Workflow
from latch import workflow, small_task, large_gpu_task from latch.types import LatchFile
@small_task def preprocess(input_file: LatchFile) -> LatchFile: """Prepare data""" return processed
@large_gpu_task def gpu_computation(data: LatchFile) -> LatchFile: """GPU-accelerated analysis""" return results
@workflow def gpu_pipeline(input_file: LatchFile) -> LatchFile: """Pipeline with GPU tasks""" preprocessed = preprocess(input_file=input_file) return gpu_computation(data=preprocessed)
Registry-Integrated Workflow
from latch import workflow, small_task from latch.registry.table import Table from latch.registry.record import Record from latch.types import LatchFile
@small_task def process_and_track(sample_id: str, table_id: str) -> str: """Process sample and update Registry""" # Get sample from registry table = Table.get(table_id=table_id) records = Record.list(table_id=table_id, filter={"sample_id": sample_id}) sample = records[0]
# Process
input_file = sample.values["fastq_file"]
output = process(input_file)
# Update registry
sample.update(values={"status": "completed", "result": output})
return "Success"
@workflow def registry_workflow(sample_id: str, table_id: str): """Workflow integrated with Registry""" return process_and_track(sample_id=sample_id, table_id=table_id)
Best Practices
Workflow Design
-
Use type annotations for all parameters
-
Write clear docstrings (appear in UI)
-
Start with standard task decorators, scale up if needed
-
Break complex workflows into modular tasks
-
Implement proper error handling
Data Management
-
Use consistent folder structures
-
Define Registry schemas before bulk entry
-
Use linked records for relationships
-
Store metadata in Registry for traceability
Resource Configuration
-
Right-size resources (don't over-allocate)
-
Use GPU only when algorithms support it
-
Monitor execution metrics and optimize
-
Design for parallel execution when possible
Development Workflow
-
Test locally with Docker before registration
-
Use version control for workflow code
-
Document resource requirements
-
Profile workflows to determine actual needs
Troubleshooting
Common Issues
Registration Failures:
-
Ensure Docker is running
-
Check authentication with latch login
-
Verify all dependencies in Dockerfile
-
Use --verbose flag for detailed logs
Resource Problems:
-
Out of memory: Increase memory in task decorator
-
Timeouts: Increase timeout parameter
-
Storage issues: Increase ephemeral storage_gib
Data Access:
-
Use correct latch:/// path format
-
Verify file exists in workspace
-
Check permissions for shared workspaces
Type Errors:
-
Add type annotations to all parameters
-
Use LatchFile/LatchDir for file/directory parameters
-
Ensure workflow return type matches actual return
Additional Resources
-
Official Documentation: https://docs.latch.bio
-
GitHub Repository: https://github.com/latchbio/latch
-
Slack Community: Join Latch SDK workspace
-
API Reference: https://docs.latch.bio/api/latch.html
-
Blog: https://blog.latch.bio
Support
For issues or questions:
-
Check documentation links above
-
Search GitHub issues
-
Ask in Slack community
-
Contact support@latch.bio