Prompt Engineering Suite

Design, version, and optimize prompts for production LLM applications.

Overview

Designing prompts for new LLM features
Improving accuracy with Chain-of-Thought reasoning
Few-shot learning with example selection
Managing prompts in production (versioning, A/B testing)
Automatic prompt optimization with DSPy

Quick Reference

Chain-of-Thought Pattern

from langchain_core.prompts import ChatPromptTemplate

COT_SYSTEM = """You are a helpful assistant that solves problems step-by-step.

When solving problems:

Break down the problem into clear steps
Show your reasoning for each step
Verify your answer before responding
If uncertain, acknowledge limitations

Format your response as: STEP 1: [description] Reasoning: [your thought process]

STEP 2: [description] Reasoning: [your thought process]

...

FINAL ANSWER: [your conclusion]"""

cot_prompt = ChatPromptTemplate.from_messages([ ("system", COT_SYSTEM), ("human", "Problem: {problem}\n\nThink through this step-by-step."), ])

Few-Shot with Dynamic Examples

from langchain_core.prompts import FewShotChatMessagePromptTemplate

examples = [ {"input": "What is 2+2?", "output": "4"}, {"input": "What is the capital of France?", "output": "Paris"}, ]

few_shot = FewShotChatMessagePromptTemplate( examples=examples, example_prompt=ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}"), ]), )

final_prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant. Answer concisely."), few_shot, ("human", "{input}"), ])

Prompt Versioning with Langfuse SDK v3

from langfuse import Langfuse

Note: Langfuse SDK v3 is OTEL-native (acquired by ClickHouse Jan )

langfuse = Langfuse()

Get versioned prompt with label

prompt = langfuse.get_prompt( name="customer-support-v2", label="production", # production, staging, canary cache_ttl_seconds=300, )

Compile with variables

compiled = prompt.compile( customer_name="John", issue="billing question" )

DSPy 3.1.0 Automatic Optimization

import dspy

class OptimizedQA(dspy.Module): def init(self): self.generate = dspy.Predict("question -> answer")

def forward(self, question):
    return self.generate(question=question)

Optimize with MIPROv2 (recommended) or BootstrapFewShot

optimizer = dspy.MIPROv2(metric=answer_match) # Data+demo-aware Bayesian optimization optimized = optimizer.compile(OptimizedQA(), trainset=examples)

Alternative: GEPA (July 2025) - Reflective Prompt Evolution

Uses model introspection to analyze failures and propose better prompts

Pattern Selection Guide

Pattern When to Use Example Use Case

Zero-shot Simple, well-defined tasks Classification, extraction

Few-shot Complex tasks needing examples Format conversion, style matching

CoT Reasoning, math, logic Problem solving, analysis

Zero-shot CoT Quick reasoning boost Add "Let's think step by step"

ReAct Tool use, multi-step Agent tasks, API calls

Structured JSON/schema output Data extraction, API responses

Key Decisions

Decision Recommendation

Few-shot examples 3-5 diverse, representative examples

Example ordering Most similar examples last (recency bias)

CoT trigger "Let's think step by step" or explicit format

Prompt versioning Langfuse with labels (production/staging)

A/B testing 50+ samples, track via trace metadata

Auto-optimization DSPy BootstrapFewShot for few-shot tuning

Anti-Patterns (FORBIDDEN)

NEVER hardcode prompts without versioning

PROMPT = "You are a helpful assistant..." # No version control!

NEVER use single example for few-shot

examples = [{"input": "x", "output": "y"}] # Too few!

NEVER skip CoT for complex reasoning

response = llm.complete("Solve: 15% of 240") # No reasoning!

ALWAYS version prompts

prompt = langfuse.get_prompt("assistant", label="production")

ALWAYS use 3-5 diverse examples

examples = [ex1, ex2, ex3, ex4, ex5]

ALWAYS use CoT for math/logic

response = llm.complete("Solve: 15% of 240. Think step by step.")

Detailed Documentation

Resource Description

references/chain-of-thought.md CoT patterns, zero-shot CoT, self-consistency

references/few-shot-patterns.md Example selection, ordering, formatting

references/prompt-versioning.md Langfuse integration, A/B testing

references/prompt-optimization.md DSPy, automatic tuning, evaluation

scripts/cot-template.py Full Chain-of-Thought implementation

scripts/few-shot-template.py Few-shot with dynamic example selection

scripts/jinja2-prompts.py Jinja2 templates (): async, caching, LLM filters, Anthropic format

Related Skills

langfuse-observability
Prompt management and A/B testing tracking
llm-evaluation
Evaluating prompt effectiveness
function-calling
Structured output patterns
llm-testing
Testing prompt variations

Capability Details

chain-of-thought

Keywords: CoT, step by step, reasoning, think, chain of thought Solves:

Improve accuracy on complex reasoning tasks
Debug LLM reasoning process
Implement self-consistency with multiple CoT paths

few-shot-learning

Keywords: few-shot, examples, in-context learning, demonstrations Solves:

Format LLM output with examples
Handle complex tasks without fine-tuning
Select optimal examples for task

prompt-versioning

Keywords: version, prompt management, A/B test, production prompt Solves:

Manage prompts in production
A/B test prompt variations
Roll back to previous versions

prompt-optimization

Keywords: DSPy, optimize, tune, automatic prompt, OPRO Solves:

Automatically optimize prompts
Find best few-shot examples
Improve accuracy without manual tuning

zero-shot-cot

Keywords: zero-shot CoT, think step by step, reasoning trigger Solves:

Quick reasoning boost without examples
Add "Let's think step by step" trigger
Improve accuracy on math/logic

self-consistency

Keywords: self-consistency, multiple paths, voting, ensemble Solves:

Generate multiple reasoning paths
Vote on most common answer
Improve reliability on hard problems

prompt-engineering-suite

Safety Notice

Copy this and send it to your AI assistant to learn