rust-ml

Machine Learning Development

Domain Constraints

Domain Rule Design Constraint Rust Implication

Large data Efficient memory Zero-copy, streaming

GPU acceleration CUDA/Metal support candle, tch-rs

Model portability Standard formats ONNX

Batch processing Throughput over latency Batched inference

Numerical precision Float handling ndarray, careful f32/f64

Reproducibility Deterministic Seeded random, versioning

Critical Rules

Avoid copying large tensors — memory bandwidth is the bottleneck. Use references, views, in-place ops.
Batch operations for GPU efficiency — GPU has overhead per kernel launch, batch to amortize.
Use standard model formats — train in Python, deploy in Rust via ONNX.

Use Case to Framework

Use Case Recommended Why

Inference only tract (ONNX) Lightweight, portable

Training + inference candle, burn Pure Rust, GPU

PyTorch models tch-rs Direct bindings

Data pipelines polars Fast, lazy eval

Key Crates

Purpose Crate

Tensors ndarray

ONNX inference tract

ML framework candle, burn

PyTorch bindings tch-rs

Data processing polars

Embeddings fastembed

Inference Server Pattern

use std::sync::OnceLock; use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> { MODEL.get_or_init(|| { tract_onnx::onnx() .model_for_path("model.onnx") .unwrap() .into_optimized() .unwrap() .into_runnable() .unwrap() }) }

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> { let model = get_model(); let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?; let result = model.run(tvec!(input.into()))?; Ok(result[0].to_array_view::<f32>()?.iter().copied().collect()) }

Batched Inference Pattern

async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> { let mut results = Vec::with_capacity(inputs.len());

for batch in inputs.chunks(batch_size) {
    let batch_tensor = stack_inputs(batch);
    let batch_output = model.run(batch_tensor).await;
    results.extend(unstack_outputs(batch_output));
}

results

}

Common Mistakes

Mistake Domain Violation Fix

Clone tensors Memory waste Use views

Single inference GPU underutilized Batch processing

Load model per request Slow Singleton pattern

Sync data loading GPU idle Async pipeline

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

elementor-development

rust-cli

elementor-hooks

elementor-themes