Daft UDF Tuning
Optimize User-Defined Functions for performance.
UDF Types
Type Decorator Use Case
Stateless @daft.func
Simple transforms. Use async for I/O-bound tasks.
Stateful @daft.cls
Expensive init (e.g., loading models). Supports gpus=N .
Batch @daft.func.batch
Vectorized CPU/GPU ops (NumPy/PyTorch). Faster.
Quick Recipes
- Async I/O (Web APIs)
@daft.func async def fetch(url: str): async with aiohttp.ClientSession() as s: return await s.get(url).text()
- GPU Batch Inference (PyTorch/Models)
@daft.cls(gpus=1) class Classifier: def init(self): self.model = load_model().cuda() # Run once per worker
@daft.method.batch(batch_size=32)
def predict(self, images):
return self.model(images.to_pylist())
Run with concurrency
df.with_column("preds", Classifier(max_concurrency=4).predict(df["img"]))
Tuning Keys
-
max_concurrency : Total parallel UDF instances.
-
gpus=N : GPU request per instance.
-
batch_size : Rows per call. Too small = overhead; too big = OOM.
-
into_batches(N) : Pre-slice partitions if memory is tight.