ml-engineering

Production-grade ML/AI systems, MLOps, and model deployment.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ml-engineering" with this command: npx skills add eyadsibai/ltk/eyadsibai-ltk-ml-engineering

ML Engineering Guide

Production-grade ML/AI systems, MLOps, and model deployment.

When to Use

  • Deploying ML models to production

  • Building ML platforms and infrastructure

  • Implementing MLOps pipelines

  • Integrating LLMs into production systems

  • Setting up model monitoring and drift detection

Tech Stack

Category Tools

ML Frameworks PyTorch, TensorFlow, Scikit-learn, XGBoost

LLM Frameworks LangChain, LlamaIndex, DSPy

Data Tools Spark, Airflow, dbt, Kafka, Databricks

Deployment Docker, Kubernetes, AWS/GCP/Azure

Monitoring MLflow, Weights & Biases, Prometheus

Databases PostgreSQL, BigQuery, Snowflake, Pinecone

Production Patterns

Model Deployment Pipeline

Model serving with FastAPI

from fastapi import FastAPI import torch

app = FastAPI() model = torch.load("model.pth")

@app.post("/predict") async def predict(data: dict): tensor = preprocess(data) with torch.no_grad(): prediction = model(tensor) return {"prediction": prediction.tolist()}

Feature Store Integration

Feast feature store

from feast import FeatureStore

store = FeatureStore(repo_path=".") features = store.get_online_features( features=["user_features:age", "user_features:location"], entity_rows=[{"user_id": 123}] ).to_dict()

Model Monitoring

Drift detection

from evidently import ColumnMapping from evidently.report import Report from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()]) report.run(reference_data=ref_df, current_data=curr_df)

MLOps Best Practices

Development

  • Test-driven development for ML pipelines

  • Version control models and data

  • Reproducible experiments with MLflow

Production

  • A/B testing infrastructure

  • Canary deployments for models

  • Automated retraining pipelines

  • Model monitoring and drift detection

Performance Targets

Metric Target

P50 Latency < 50ms

P95 Latency < 100ms

P99 Latency < 200ms

Throughput

1000 RPS

Availability 99.9%

LLM Integration Patterns

RAG System

Basic RAG with LangChain

from langchain.vectorstores import Pinecone from langchain.embeddings import OpenAIEmbeddings from langchain.chains import RetrievalQA

vectorstore = Pinecone.from_existing_index( index_name="docs", embedding=OpenAIEmbeddings() ) qa = RetrievalQA.from_chain_type( llm=llm, retriever=vectorstore.as_retriever() )

Prompt Management

Structured prompts with DSPy

import dspy

class QA(dspy.Signature): """Answer questions based on context.""" context = dspy.InputField() question = dspy.InputField() answer = dspy.OutputField()

qa = dspy.Predict(QA)

Common Commands

Development

python -m pytest tests/ -v --cov python -m black src/ python -m pylint src/

Training

python scripts/train.py --config prod.yaml mlflow run . -P epochs=10

Deployment

docker build -t model:v1 . kubectl apply -f k8s/model-serving.yaml

Monitoring

mlflow ui --port 5000

Security & Compliance

  • Authentication for model endpoints

  • Data encryption (at rest & in transit)

  • PII handling and anonymization

  • GDPR/CCPA compliance

  • Model access audit logging

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

agent-browser

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-evaluation

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

multi-agent-patterns

No summary provided by upstream source.

Repository SourceNeeds Review