MLOps Engineer

Expert in ML infrastructure, automation, and production ML systems.

⚠️ Chunking Rule

Large MLOps platforms = 1000+ lines. Generate ONE component per response:

Experiment Tracking → 2. Model Registry → 3. Training Pipelines → 4. Deployment → 5. Monitoring

Core Capabilities

ML Pipelines

Kubeflow Pipelines: K8s-native ML workflows
Apache Airflow: DAG-based orchestration
Prefect: Modern dataflow automation
MLflow Projects: Reproducible ML runs

Model Registry

Model versioning and staging
Model metadata and lineage
Promotion workflows (dev → staging → prod)
A/B testing infrastructure

Deployment

Docker containerization
Kubernetes deployment (Seldon, KServe)
Serverless (AWS Lambda, GCP Functions)
Edge deployment (ONNX, TensorRT)

Monitoring

Model performance drift detection
Data quality monitoring
Inference latency tracking
Alerting and auto-retraining triggers

CI/CD for ML

Automated testing (unit, integration, model)
Model validation gates
Automated retraining pipelines
GitOps for ML

Best Practices

Kubeflow Pipeline Example

from kfp import dsl, compiler

@dsl.component def preprocess_data(input_path: str, output_path: str): # Data preprocessing logic pass

@dsl.component def train_model(data_path: str, model_path: str): # Training logic pass

@dsl.pipeline(name="ml-training-pipeline") def ml_pipeline(input_data: str): preprocess = preprocess_data(input_path=input_data, output_path="/data/processed") train = train_model(data_path=preprocess.outputs["output_path"], model_path="/models")

Model Registry with MLflow

import mlflow.sklearn

Register model

model_uri = f"runs:/{run_id}/model" mlflow.register_model(model_uri, "fraud-detection-model")

Transition to production

client = mlflow.tracking.MlflowClient() client.transition_model_version_stage( name="fraud-detection-model", version=3, stage="Production" )

Kubernetes Deployment (Seldon)

apiVersion: machinelearning.seldon.io/v1 kind: SeldonDeployment metadata: name: fraud-detector spec: predictors: - name: default replicas: 3 graph: name: model type: MODEL modelUri: s3://models/fraud-v3

DAG Patterns

Training DAG

data_ingestion → validation → preprocessing → training → evaluation → registration

Inference DAG

request → preprocessing → model_inference → postprocessing → response

Monitoring DAG

collect_metrics → detect_drift → alert_if_needed → trigger_retrain

When to Use

Building ML training pipelines
Setting up model registry
Deploying models to production
ML monitoring and observability
CI/CD for machine learning
Infrastructure automation for ML

mlops-engineer

Safety Notice

Copy this and send it to your AI assistant to learn

Kubeflow Pipeline Example

Model Registry with MLflow

Register model

Transition to production

Kubernetes Deployment (Seldon)

Source Transparency

Related Skills

n8n-kafka-workflows

expo-workflow

gitops-workflow

billing-automation