Machine Learning Domain
Layer 3: Domain Constraints
Domain Constraints → Design Implications
Domain Rule Design Constraint Rust Implication
Large data Efficient memory Zero-copy, streaming
GPU acceleration CUDA/Metal support candle, tch-rs
Model portability Standard formats ONNX
Batch processing Throughput over latency Batched inference
Numerical precision Float handling ndarray, careful f32/f64
Reproducibility Deterministic Seeded random, versioning
Critical Constraints
Memory Efficiency
RULE: Avoid copying large tensors WHY: Memory bandwidth is bottleneck RUST: References, views, in-place ops
GPU Utilization
RULE: Batch operations for GPU efficiency WHY: GPU overhead per kernel launch RUST: Batch sizes, async data loading
Model Portability
RULE: Use standard model formats WHY: Train in Python, deploy in Rust RUST: ONNX via tract or candle
Trace Down ↓
From constraints to design (Layer 2):
"Need efficient data pipelines" ↓ m10-performance: Streaming, batching ↓ polars: Lazy evaluation
"Need GPU inference" ↓ m07-concurrency: Async data loading ↓ candle/tch-rs: CUDA backend
"Need model loading" ↓ m12-lifecycle: Lazy init, caching ↓ tract: ONNX runtime
Use Case → Framework
Use Case Recommended Why
Inference only tract (ONNX) Lightweight, portable
Training + inference candle, burn Pure Rust, GPU
PyTorch models tch-rs Direct bindings
Data pipelines polars Fast, lazy eval
Key Crates
Purpose Crate
Tensors ndarray
ONNX inference tract
ML framework candle, burn
PyTorch bindings tch-rs
Data processing polars
Embeddings fastembed
Design Patterns
Pattern Purpose Implementation
Model loading Once, reuse OnceLock<Model>
Batching Throughput Collect then process
Streaming Large data Iterator-based
GPU async Parallelism Data loading parallel to compute
Code Pattern: Inference Server
use std::sync::OnceLock; use tract_onnx::prelude::*;
static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();
fn get_model() -> &'static SimplePlan<...> { MODEL.get_or_init(|| { tract_onnx::onnx() .model_for_path("model.onnx") .unwrap() .into_optimized() .unwrap() .into_runnable() .unwrap() }) }
async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> { let model = get_model(); let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?; let result = model.run(tvec!(input.into()))?; Ok(result[0].to_array_view::<f32>()?.iter().copied().collect()) }
Code Pattern: Batched Inference
async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> { let mut results = Vec::with_capacity(inputs.len());
for batch in inputs.chunks(batch_size) {
// Stack inputs into batch tensor
let batch_tensor = stack_inputs(batch);
// Run inference on batch
let batch_output = model.run(batch_tensor).await;
// Unstack results
results.extend(unstack_outputs(batch_output));
}
results
}
Common Mistakes
Mistake Domain Violation Fix
Clone tensors Memory waste Use views
Single inference GPU underutilized Batch processing
Load model per request Slow Singleton pattern
Sync data loading GPU idle Async pipeline
Trace to Layer 1
Constraint Layer 2 Pattern Layer 1 Implementation
Memory efficiency Zero-copy ndarray views
Model singleton Lazy init OnceLock
Batch processing Chunked iteration chunks() + parallel
GPU async Concurrent loading tokio::spawn + GPU
Related Skills
When See
Performance m10-performance
Lazy initialization m12-lifecycle
Async patterns m07-concurrency
Memory efficiency m01-ownership