Prompt Engineering — Operational Skill
Modern Best Practices (January 2026): versioned prompts, explicit output contracts, regression tests, and safety threat modeling for tool/RAG prompts (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
This skill provides operational guidance for building production-ready prompts across standard tasks, RAG workflows, agent orchestration, structured outputs, hidden reasoning, and multi-step planning.
All content is operational, not theoretical. Focus on patterns, checklists, and copy-paste templates.
Quick Start (60 seconds)
-
Pick a pattern from the decision tree (structured output, extractor, RAG, tools/agent, rewrite, classification).
-
Start from a template in assets/ and fill in TASK , INPUT , RULES , and OUTPUT FORMAT .
-
Add guardrails: instruction/data separation, “no invented details”, missing → null /explicit missing.
-
Add validation: JSON parse check, schema check, citations check, post-tool checks.
-
Add evals: 10–20 cases while iterating, 50–200 before release, plus adversarial injection cases.
Model Notes (2026)
This skill includes Claude Code + Codex CLI optimizations:
-
Action directives: Frame for implementation, not suggestions
-
Parallel tool execution: Independent tool calls can run simultaneously
-
Long-horizon task management: State tracking, incremental progress, context compaction resilience
-
Positive framing: Describe desired behavior rather than prohibitions
-
Style matching: Prompt formatting influences output style
-
Domain-specific patterns: Specialized guidance for frontend, research, and agentic coding
-
Style-adversarial resilience: Stress-test refusals with poetic/role-play rewrites; normalize or decline stylized harmful asks before tool use
Prefer “brief justification” over requesting chain-of-thought. When using private reasoning patterns, instruct: think internally; output only the final answer.
Quick Reference
Task Pattern to Use Key Components When to Use
Machine-parseable output Structured Output JSON schema, "JSON-only" directive, no prose API integrations, data extraction
Field extraction Deterministic Extractor Exact schema, missing->null, no transformations Form data, invoice parsing
Use retrieved context RAG Workflow Context relevance check, chunk citations, explicit missing info Knowledge bases, documentation search
Internal reasoning Hidden Chain-of-Thought Internal reasoning, final answer only Classification, complex decisions
Tool-using agent Tool/Agent Planner Plan-then-act, one tool per turn Multi-step workflows, API calls
Text transformation Rewrite + Constrain Style rules, meaning preservation, format spec Content adaptation, summarization
Classification Decision Tree Ordered branches, mutually exclusive, JSON result Routing, categorization, triage
Decision Tree: Choosing the Right Pattern
User needs: [Prompt Type]
|-- Output must be machine-readable?
| |-- Extract specific fields only? -> Deterministic Extractor Pattern
| -- Generate structured data? -> **Structured Output Pattern (JSON)** | |-- Use external knowledge? | -- Retrieved context must be cited? -> RAG Workflow Pattern
|
|-- Requires reasoning but hide process?
| -- Classification or decision task? -> **Hidden Chain-of-Thought Pattern** | |-- Needs to call external tools/APIs? | -- Multi-step workflow? -> Tool/Agent Planner Pattern
|
|-- Transform existing text?
| -- Style/format constraints? -> **Rewrite + Constrain Pattern** | -- Classify or route to categories?
`-- Mutually exclusive rules? -> Decision Tree Pattern
Copy/Paste: Minimal Prompt Skeletons
- Generic "output contract" skeleton
TASK: {{one_sentence_task}}
INPUT: {{input_data}}
RULES:
- Follow TASK exactly.
- Use only INPUT (and tool outputs if tools are allowed).
- No invented details. Missing required info -> say what is missing.
- Keep reasoning hidden.
- Follow OUTPUT FORMAT exactly.
OUTPUT FORMAT: {{schema_or_format_spec}}
- Tool/agent skeleton (deterministic)
AVAILABLE TOOLS: {{tool_signatures_or_names}}
WORKFLOW:
- Make a short plan.
- Call tools only when required to complete the task.
- Validate tool outputs before using them.
- If the environment supports parallel tool calls, run independent calls in parallel.
- RAG skeleton (grounded)
RETRIEVED CONTEXT: {{chunks_with_ids}}
RULES:
- Use only retrieved context for factual claims.
- Cite chunk ids for each claim.
- If evidence is missing, say what is missing.
Operational Checklists
Use these references when validating or debugging prompts:
-
frameworks/shared-skills/skills/ai-prompt-engineering/references/quality-checklists.md
-
frameworks/shared-skills/skills/ai-prompt-engineering/references/production-guidelines.md
Context Engineering (2026)
True expertise in prompting extends beyond writing instructions to shaping the entire context in which the model operates. Context engineering encompasses:
-
Conversation history: What prior turns inform the current response
-
Retrieved context (RAG): External knowledge injected into the prompt
-
Structured inputs: JSON schemas, system/user message separation
-
Tool outputs: Results from previous tool calls that shape next steps
Context Engineering vs Prompt Engineering
Aspect Prompt Engineering Context Engineering
Focus Instruction text Full input pipeline
Scope Single prompt RAG + history + tools
Optimization Word choice, structure Information architecture
Goal Clear instructions Optimal context window
Key Context Engineering Patterns
-
Context Prioritization: Place most relevant information first; models attend more strongly to early context.
-
Context Compression: Summarize history, truncate tool outputs, select most relevant RAG chunks.
-
Context Separation: Use clear delimiters (<system> , <user> , <context> ) to separate instruction types.
-
Dynamic Context: Adjust context based on task complexity - simple tasks need less context, complex tasks need more.
Core Concepts vs Implementation Practices
Core Concepts (Vendor-Agnostic)
-
Prompt contract: inputs, allowed tools, output schema, max tokens, and refusal rules.
-
Context engineering: conversation history, RAG context, tool outputs, and structured inputs shape model behavior.
-
Determinism controls: temperature/top_p, constrained decoding/structured outputs, and strict formatting.
-
Cost & latency budgets: prompt length and max output drive tokens and tail latency; enforce hard limits and measure p95/p99.
-
Evaluation: golden sets + regression gates + A/B + post-deploy monitoring.
-
Security: prompt injection, data exfiltration, and tool misuse are primary threats (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).
Implementation Practices (Model/Platform-Specific)
-
Use model-specific structured output features when available; keep a schema validator as the source of truth.
-
Align tracing/metrics with OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Do / Avoid
Do
-
Do keep prompts small and modular; centralize shared fragments (policies, schemas, style).
-
Do add a prompt eval harness and block merges on regressions.
-
Do prefer "brief justification" over requesting chain-of-thought; treat hidden reasoning as model-internal.
Avoid
-
Avoid prompt sprawl (many near-duplicates with no owner or tests).
-
Avoid brittle multi-step chains without intermediate validation.
-
Avoid mixing policy and product copy in the same prompt (harder to audit and update).
Navigation: Core Patterns
-
Core Patterns - 7 production-grade prompt patterns
-
Structured Output (JSON), Deterministic Extractor, RAG Workflow
-
Hidden Chain-of-Thought, Tool/Agent Planner, Rewrite + Constrain, Decision Tree
-
Each pattern includes structure template and validation checklist
Navigation: Best Practices
Best Practices (Core) - Foundation rules for production-grade prompts
-
System instruction design, output contract specification, action directives
-
Context handling, error recovery, positive framing, style matching, style-adversarial red teaming
-
Anti-patterns, Claude 4+ specific optimizations
Production Guidelines - Deployment and operational guidance
-
Evaluation & testing (Prompt CI/CD), model parameters, few-shot selection
-
Safety & guardrails, conversation memory, context compaction resilience
-
Answer engineering, decomposition, multilingual/multimodal, benchmarking
-
CI/CD Tools (2026): Promptfoo, DeepEval integration patterns
-
Security (2026): PromptGuard 4-layer defense, Microsoft Prompt Shields, taint tracking
Quality Checklists - Validation checklists before deployment
-
Prompt QA, JSON validation, agent workflow checks
-
RAG workflow, safety & security, performance optimization
-
Testing coverage, anti-patterns, quality score rubric
Domain-Specific Patterns - Claude 4+ optimized patterns for specialized domains
-
Frontend/visual code: Creativity encouragement, design variations, micro-interactions
-
Research tasks: Success criteria, verification, hypothesis tracking
-
Agentic coding: No speculation rule, principled implementation, investigation patterns
-
Cross-domain best practices and quality modifiers
Navigation: Specialized Patterns
RAG Patterns - Retrieval-augmented generation workflows
-
Context grounding, chunk citation, missing information handling
Agent and Tool Patterns - Tool use and agent orchestration
-
Plan-then-act workflows, tool calling, multi-step reasoning, generate-verify-revise chains
-
Multi-Agent Orchestration (2026): centralized, handoff, federated patterns; plan-and-execute (90% cost reduction)
Extraction Patterns - Deterministic field extraction
-
Schema-based extraction, null handling, no hallucinations
Reasoning Patterns (Hidden CoT) - Internal reasoning without visible output
-
Hidden reasoning, final answer only, classification workflows
-
Extended Thinking API (Claude 4+): budget management, think tool, multishot patterns
Additional Patterns - Extended prompt engineering techniques
-
Advanced patterns, edge cases, optimization strategies
Prompt Testing & CI/CD - Automated prompt evaluation pipelines
-
Promptfoo, DeepEval integration, regression detection, A/B testing, quality gates
Multimodal Prompt Patterns - Vision, audio, and document input patterns
-
Image description, OCR+LLM, bounding box prompts, Whisper conditioning, video frame analysis
Prompt Security & Defense - Securing LLM applications against adversarial attacks
- Injection detection (PromptGuard, Prompt Shields), defense-in-depth, taint tracking, red team testing
Navigation: Templates
Templates are copy-paste ready and organized by complexity:
Quick Templates
- Quick Template - Fast, minimal prompt structure
Standard Templates
-
Standard Template - Production-grade operational prompt
-
Agent Template - Tool-using agent with planning
-
RAG Template - Retrieval-augmented generation
-
Chain-of-Thought Template - Hidden reasoning pattern
-
JSON Extractor Template - Deterministic field extraction
-
Prompt Evaluation Template - Regression tests, A/B testing, rollout gates
External Resources
External references are listed in data/sources.json:
-
Official documentation (OpenAI, Anthropic, Google)
-
LLM frameworks (LangChain, LlamaIndex)
-
Vector databases (Pinecone, Weaviate, FAISS)
-
Evaluation tools (OpenAI Evals, HELM)
-
Safety guides and standards
-
RAG and retrieval resources
Freshness Rule (2026)
When asked for “latest” prompting recommendations, prefer provider docs and standards from data/sources.json . If web search is unavailable, state the constraint and avoid overconfident “current best” claims.
Related Skills
This skill provides foundational prompt engineering patterns. For specialized implementations:
AI/LLM Skills:
-
AI Agents Development - Production agent patterns, MCP integration, orchestration
-
AI LLM Engineering - LLM application architecture and deployment
-
AI LLM RAG Engineering - Advanced RAG pipelines and chunking strategies
-
AI LLM Search & Retrieval - Search optimization, hybrid retrieval, reranking
-
AI LLM Development - Fine-tuning, evaluation, dataset creation
Software Development Skills:
-
Software Architecture Design - System design patterns
-
Software Backend - Backend implementation
-
Foundation API Design - API design and contracts
Usage Notes
For Claude Code:
-
Reference this skill when building prompts for agents, commands, or integrations
-
Use Quick Reference table for fast pattern lookup
-
Follow Decision Tree to select appropriate pattern
-
Validate outputs with Quality Checklists before deployment
-
Use templates as starting points, customize for specific use cases
For Codex CLI:
-
Use the same patterns and templates; adapt tool-use wording to the local tool interface
-
For long-horizon tasks, track progress explicitly (a step list/plan) and update it as work completes
-
Run independent reads/searches in parallel when the environment supports it; keep writes/edits serialized
-
AGENTS.md Integration: Place project-specific prompt guidance in AGENTS.md files at global (~/.codex/AGENTS.md), project-level (./AGENTS.md), or subdirectory scope for layered instructions
-
Reasoning Effort: Use medium for interactive coding (default), high /xhigh for complex autonomous multi-hour tasks
Fact-Checking
-
Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
-
Prefer primary sources; report source links and dates for volatile information.
-
If web access is unavailable, state the limitation and mark guidance as unverified.