AI Engineering

Build production LLM applications and AI systems.

When to Use

Integrating LLM APIs
Building RAG systems
Creating AI agents
Vector database setup
Token optimization

LLM Integration

API Setup

from anthropic import Anthropic

client = Anthropic()

def chat(messages: list[dict], system: str = None) -> str: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system=system or "You are a helpful assistant.", messages=messages ) return response.content[0].text

With retry and error handling

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)) def safe_chat(messages, system=None): try: return chat(messages, system) except Exception as e: logger.error(f"LLM call failed: {e}") raise

Structured Output

import json

def extract_structured(text: str, schema: dict) -> dict: prompt = f"""Extract information from the text according to this schema: {json.dumps(schema, indent=2)}

Text: {text}

Return valid JSON only."""

response = chat([{"role": "user", "content": prompt}])
return json.loads(response)

RAG System

Document Processing

from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_documents(docs: list[str], chunk_size=1000, overlap=200): splitter = RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=overlap, separators=["\n\n", "\n", ". ", " "] ) return splitter.split_documents(docs)

Vector Store

from qdrant_client import QdrantClient from qdrant_client.models import Distance, VectorParams

client = QdrantClient(":memory:") # or url="http://localhost:6333"

Create collection

client.create_collection( collection_name="docs", vectors_config=VectorParams(size=1536, distance=Distance.COSINE) )

Upsert vectors

client.upsert( collection_name="docs", points=[ {"id": i, "vector": embed(chunk), "payload": {"text": chunk}} for i, chunk in enumerate(chunks) ] )

Search

results = client.search( collection_name="docs", query_vector=embed(query), limit=5 )

RAG Query

def rag_query(question: str, top_k=5) -> str: # Retrieve relevant chunks results = client.search( collection_name="docs", query_vector=embed(question), limit=top_k )

context = "\n\n".join([r.payload["text"] for r in results])

prompt = f"""Answer based on the context below.

Context: {context}

Question: {question}

Answer:"""

return chat([{"role": "user", "content": prompt}])

Cost Optimization

Cache frequent queries
Use smaller models for simple tasks
Batch requests when possible
Track token usage per feature
Set max_tokens appropriately

Examples

Input: "Add AI chat to this app" Action: Set up LLM client, create chat endpoint, add error handling

Input: "Build RAG for documentation" Action: Chunk docs, create embeddings, set up vector store, implement search

ai-engineer

Safety Notice

Copy this and send it to your AI assistant to learn

With retry and error handling

Create collection

Upsert vectors

Search

Source Transparency

Related Skills

data-science

cpp

quarto-book