AI/ML Engineering

Build production AI systems with modern patterns and tools.

Quick Reference

The 2026 AI Stack

Layer Tool Purpose

Prompting DSPy Programmatic prompt optimization

Orchestration LangGraph Stateful multi-agent workflows

RAG LlamaIndex Document ingestion and retrieval

Vectors Qdrant / Pinecone Embedding storage and search

Evaluation RAGAS RAG quality metrics

Experiment Tracking MLflow / W&B Logging, versioning, comparison

Serving BentoML / vLLM Model deployment

Protocol MCP Tool and context integration

DSPy: Programmatic Prompting

Manual prompts are dead. DSPy treats prompts as optimizable code:

import dspy

class QA(dspy.Signature): """Answer questions with short factoid answers.""" question = dspy.InputField() answer = dspy.OutputField(desc="1-5 words")

Create module

qa = dspy.Predict(QA)

Use it

result = qa(question="What is the capital of France?") print(result.answer) # "Paris"

Optimize with real data:

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=exact_match) optimized_qa = optimizer.compile(qa, trainset=train_data)

RAG Architecture (Production)

Query → Rewrite → Hybrid Retrieval → Rerank → Generate → Cite │ │ │ v v v Query expansion Dense + BM25 Cross-encoder

LlamaIndex + LangGraph Pattern:

from llama_index.core import VectorStoreIndex from langgraph.graph import StateGraph

Data layer (LlamaIndex)

index = VectorStoreIndex.from_documents(docs) query_engine = index.as_query_engine()

Control layer (LangGraph)

def retrieve(state): response = query_engine.query(state["question"]) return {"context": response.response, "sources": response.source_nodes}

graph = StateGraph(State) graph.add_node("retrieve", retrieve) graph.add_node("generate", generate_answer) graph.add_edge("retrieve", "generate")

MCP Integration

Model Context Protocol is the standard for tool integration:

from mcp import Server, Tool

server = Server("my-tools")

@server.tool() async def search_docs(query: str) -> str: """Search the knowledge base.""" results = await vector_store.search(query) return format_results(results)

Embeddings (2026)

Model Dimensions Best For

text-embedding-3-large 3072 General purpose

BGE-M3 1024 Multilingual RAG

Qwen3-Embedding Flexible Custom domains

Fine-Tuning with LoRA/QLoRA

from peft import LoraConfig, get_peft_model

config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, )

model = get_peft_model(base_model, config)

Train on ~24GB VRAM (QLoRA on RTX 4090)

MLOps Pipeline

MLflow tracking

mlflow.set_experiment("rag-v2")

with mlflow.start_run(): mlflow.log_params({"chunk_size": 512, "model": "gpt-4"}) mlflow.log_metrics({"faithfulness": 0.92, "relevance": 0.88}) mlflow.log_artifact("prompts/qa.txt")

Evaluation with RAGAS

from ragas import evaluate from ragas.metrics import faithfulness, answer_relevancy, context_precision

results = evaluate( dataset, metrics=[faithfulness, answer_relevancy, context_precision], ) print(results) # {'faithfulness': 0.92, 'answer_relevancy': 0.88, ...}

Vector Database Selection

DB Best For Pricing

Qdrant Self-hosted, filtering 1GB free forever

Pinecone Managed, zero-ops Free tier available

Weaviate Knowledge graphs 14-day trial

Milvus Billion-scale Self-hosted

Agents

ai-engineer - LLM integration, RAG, MCP, production AI
mlops-engineer - Model deployment, monitoring, pipelines
data-scientist - Analysis, modeling, experimentation
ml-researcher - Cutting-edge architectures, paper implementation
cv-engineer - Computer vision, VLMs, image processing

Deep Dives

references/dspy-guide.md
references/rag-patterns.md
references/mcp-integration.md
references/fine-tuning.md
references/evaluation.md

Examples

examples/rag-pipeline/
examples/mcp-server/
examples/dspy-optimization/

ai

Safety Notice

Copy this and send it to your AI assistant to learn

Create module

Use it

Data layer (LlamaIndex)

Control layer (LangGraph)

Train on ~24GB VRAM (QLoRA on RTX 4090)

MLflow tracking

Source Transparency

Related Skills

tui-design

orchestrate

brainstorm

plan