Prompt Engineering Suite
Design, version, and optimize prompts for production LLM applications.
Overview
-
Designing prompts for new LLM features
-
Improving accuracy with Chain-of-Thought reasoning
-
Few-shot learning with example selection
-
Managing prompts in production (versioning, A/B testing)
-
Automatic prompt optimization with DSPy
Quick Reference
Chain-of-Thought Pattern
from langchain_core.prompts import ChatPromptTemplate
COT_SYSTEM = """You are a helpful assistant that solves problems step-by-step.
When solving problems:
- Break down the problem into clear steps
- Show your reasoning for each step
- Verify your answer before responding
- If uncertain, acknowledge limitations
Format your response as: STEP 1: [description] Reasoning: [your thought process]
STEP 2: [description] Reasoning: [your thought process]
...
FINAL ANSWER: [your conclusion]"""
cot_prompt = ChatPromptTemplate.from_messages([ ("system", COT_SYSTEM), ("human", "Problem: {problem}\n\nThink through this step-by-step."), ])
Few-Shot with Dynamic Examples
from langchain_core.prompts import FewShotChatMessagePromptTemplate
examples = [ {"input": "What is 2+2?", "output": "4"}, {"input": "What is the capital of France?", "output": "Paris"}, ]
few_shot = FewShotChatMessagePromptTemplate( examples=examples, example_prompt=ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}"), ]), )
final_prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant. Answer concisely."), few_shot, ("human", "{input}"), ])
Prompt Versioning with Langfuse SDK v3
from langfuse import Langfuse
Note: Langfuse SDK v3 is OTEL-native (acquired by ClickHouse Jan )
langfuse = Langfuse()
Get versioned prompt with label
prompt = langfuse.get_prompt( name="customer-support-v2", label="production", # production, staging, canary cache_ttl_seconds=300, )
Compile with variables
compiled = prompt.compile( customer_name="John", issue="billing question" )
DSPy 3.1.0 Automatic Optimization
import dspy
class OptimizedQA(dspy.Module): def init(self): self.generate = dspy.Predict("question -> answer")
def forward(self, question):
return self.generate(question=question)
Optimize with MIPROv2 (recommended) or BootstrapFewShot
optimizer = dspy.MIPROv2(metric=answer_match) # Data+demo-aware Bayesian optimization optimized = optimizer.compile(OptimizedQA(), trainset=examples)
Alternative: GEPA (July 2025) - Reflective Prompt Evolution
Uses model introspection to analyze failures and propose better prompts
Pattern Selection Guide
Pattern When to Use Example Use Case
Zero-shot Simple, well-defined tasks Classification, extraction
Few-shot Complex tasks needing examples Format conversion, style matching
CoT Reasoning, math, logic Problem solving, analysis
Zero-shot CoT Quick reasoning boost Add "Let's think step by step"
ReAct Tool use, multi-step Agent tasks, API calls
Structured JSON/schema output Data extraction, API responses
Key Decisions
Decision Recommendation
Few-shot examples 3-5 diverse, representative examples
Example ordering Most similar examples last (recency bias)
CoT trigger "Let's think step by step" or explicit format
Prompt versioning Langfuse with labels (production/staging)
A/B testing 50+ samples, track via trace metadata
Auto-optimization DSPy BootstrapFewShot for few-shot tuning
Anti-Patterns (FORBIDDEN)
NEVER hardcode prompts without versioning
PROMPT = "You are a helpful assistant..." # No version control!
NEVER use single example for few-shot
examples = [{"input": "x", "output": "y"}] # Too few!
NEVER skip CoT for complex reasoning
response = llm.complete("Solve: 15% of 240") # No reasoning!
ALWAYS version prompts
prompt = langfuse.get_prompt("assistant", label="production")
ALWAYS use 3-5 diverse examples
examples = [ex1, ex2, ex3, ex4, ex5]
ALWAYS use CoT for math/logic
response = llm.complete("Solve: 15% of 240. Think step by step.")
Detailed Documentation
Resource Description
references/chain-of-thought.md CoT patterns, zero-shot CoT, self-consistency
references/few-shot-patterns.md Example selection, ordering, formatting
references/prompt-versioning.md Langfuse integration, A/B testing
references/prompt-optimization.md DSPy, automatic tuning, evaluation
scripts/cot-template.py Full Chain-of-Thought implementation
scripts/few-shot-template.py Few-shot with dynamic example selection
scripts/jinja2-prompts.py Jinja2 templates (): async, caching, LLM filters, Anthropic format
Related Skills
-
langfuse-observability
-
Prompt management and A/B testing tracking
-
llm-evaluation
-
Evaluating prompt effectiveness
-
function-calling
-
Structured output patterns
-
llm-testing
-
Testing prompt variations
Capability Details
chain-of-thought
Keywords: CoT, step by step, reasoning, think, chain of thought Solves:
-
Improve accuracy on complex reasoning tasks
-
Debug LLM reasoning process
-
Implement self-consistency with multiple CoT paths
few-shot-learning
Keywords: few-shot, examples, in-context learning, demonstrations Solves:
-
Format LLM output with examples
-
Handle complex tasks without fine-tuning
-
Select optimal examples for task
prompt-versioning
Keywords: version, prompt management, A/B test, production prompt Solves:
-
Manage prompts in production
-
A/B test prompt variations
-
Roll back to previous versions
prompt-optimization
Keywords: DSPy, optimize, tune, automatic prompt, OPRO Solves:
-
Automatically optimize prompts
-
Find best few-shot examples
-
Improve accuracy without manual tuning
zero-shot-cot
Keywords: zero-shot CoT, think step by step, reasoning trigger Solves:
-
Quick reasoning boost without examples
-
Add "Let's think step by step" trigger
-
Improve accuracy on math/logic
self-consistency
Keywords: self-consistency, multiple paths, voting, ensemble Solves:
-
Generate multiple reasoning paths
-
Vote on most common answer
-
Improve reliability on hard problems