ai-prompt-engineering

Prompt Engineering — Operational Skill

Modern Best Practices (January 2026): versioned prompts, explicit output contracts, regression tests, and safety threat modeling for tool/RAG prompts (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

This skill provides operational guidance for building production-ready prompts across standard tasks, RAG workflows, agent orchestration, structured outputs, hidden reasoning, and multi-step planning.

All content is operational, not theoretical. Focus on patterns, checklists, and copy-paste templates.

Quick Start (60 seconds)

Pick a pattern from the decision tree (structured output, extractor, RAG, tools/agent, rewrite, classification).
Start from a template in assets/ and fill in TASK , INPUT , RULES , and OUTPUT FORMAT .
Add guardrails: instruction/data separation, “no invented details”, missing → null /explicit missing.
Add validation: JSON parse check, schema check, citations check, post-tool checks.
Add evals: 10–20 cases while iterating, 50–200 before release, plus adversarial injection cases.

Model Notes (2026)

This skill includes Claude Code + Codex CLI optimizations:

Action directives: Frame for implementation, not suggestions
Parallel tool execution: Independent tool calls can run simultaneously
Long-horizon task management: State tracking, incremental progress, context compaction resilience
Positive framing: Describe desired behavior rather than prohibitions
Style matching: Prompt formatting influences output style
Domain-specific patterns: Specialized guidance for frontend, research, and agentic coding
Style-adversarial resilience: Stress-test refusals with poetic/role-play rewrites; normalize or decline stylized harmful asks before tool use

Prefer “brief justification” over requesting chain-of-thought. When using private reasoning patterns, instruct: think internally; output only the final answer.

Quick Reference

Task Pattern to Use Key Components When to Use

Machine-parseable output Structured Output JSON schema, "JSON-only" directive, no prose API integrations, data extraction

Field extraction Deterministic Extractor Exact schema, missing->null, no transformations Form data, invoice parsing

Use retrieved context RAG Workflow Context relevance check, chunk citations, explicit missing info Knowledge bases, documentation search

Internal reasoning Hidden Chain-of-Thought Internal reasoning, final answer only Classification, complex decisions

Tool-using agent Tool/Agent Planner Plan-then-act, one tool per turn Multi-step workflows, API calls

Text transformation Rewrite + Constrain Style rules, meaning preservation, format spec Content adaptation, summarization

Classification Decision Tree Ordered branches, mutually exclusive, JSON result Routing, categorization, triage

Decision Tree: Choosing the Right Pattern

User needs: [Prompt Type] |-- Output must be machine-readable? | |-- Extract specific fields only? -> Deterministic Extractor Pattern | -- Generate structured data? -> **Structured Output Pattern (JSON)** | |-- Use external knowledge? | -- Retrieved context must be cited? -> RAG Workflow Pattern | |-- Requires reasoning but hide process? | -- Classification or decision task? -> **Hidden Chain-of-Thought Pattern** | |-- Needs to call external tools/APIs? | -- Multi-step workflow? -> Tool/Agent Planner Pattern | |-- Transform existing text? | -- Style/format constraints? -> **Rewrite + Constrain Pattern** | -- Classify or route to categories? `-- Mutually exclusive rules? -> Decision Tree Pattern

Copy/Paste: Minimal Prompt Skeletons

Generic "output contract" skeleton

TASK: {{one_sentence_task}}

INPUT: {{input_data}}

RULES:

Follow TASK exactly.
Use only INPUT (and tool outputs if tools are allowed).
No invented details. Missing required info -> say what is missing.
Keep reasoning hidden.
Follow OUTPUT FORMAT exactly.

OUTPUT FORMAT: {{schema_or_format_spec}}

Tool/agent skeleton (deterministic)

AVAILABLE TOOLS: {{tool_signatures_or_names}}

WORKFLOW:

Make a short plan.
Call tools only when required to complete the task.
Validate tool outputs before using them.
If the environment supports parallel tool calls, run independent calls in parallel.

RAG skeleton (grounded)

RETRIEVED CONTEXT: {{chunks_with_ids}}

RULES:

Use only retrieved context for factual claims.
Cite chunk ids for each claim.
If evidence is missing, say what is missing.

Operational Checklists

Use these references when validating or debugging prompts:

frameworks/shared-skills/skills/ai-prompt-engineering/references/quality-checklists.md
frameworks/shared-skills/skills/ai-prompt-engineering/references/production-guidelines.md

Context Engineering (2026)

True expertise in prompting extends beyond writing instructions to shaping the entire context in which the model operates. Context engineering encompasses:

Conversation history: What prior turns inform the current response
Retrieved context (RAG): External knowledge injected into the prompt
Structured inputs: JSON schemas, system/user message separation
Tool outputs: Results from previous tool calls that shape next steps

Context Engineering vs Prompt Engineering

Aspect Prompt Engineering Context Engineering

Focus Instruction text Full input pipeline

Scope Single prompt RAG + history + tools

Optimization Word choice, structure Information architecture

Goal Clear instructions Optimal context window

Key Context Engineering Patterns

Context Prioritization: Place most relevant information first; models attend more strongly to early context.
Context Compression: Summarize history, truncate tool outputs, select most relevant RAG chunks.
Context Separation: Use clear delimiters (<system> , <user> , <context> ) to separate instruction types.
Dynamic Context: Adjust context based on task complexity - simple tasks need less context, complex tasks need more.

Core Concepts vs Implementation Practices

Core Concepts (Vendor-Agnostic)

Prompt contract: inputs, allowed tools, output schema, max tokens, and refusal rules.
Context engineering: conversation history, RAG context, tool outputs, and structured inputs shape model behavior.
Determinism controls: temperature/top_p, constrained decoding/structured outputs, and strict formatting.
Cost & latency budgets: prompt length and max output drive tokens and tail latency; enforce hard limits and measure p95/p99.
Evaluation: golden sets + regression gates + A/B + post-deploy monitoring.
Security: prompt injection, data exfiltration, and tool misuse are primary threats (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Model/Platform-Specific)

Use model-specific structured output features when available; keep a schema validator as the source of truth.
Align tracing/metrics with OpenTelemetry GenAI semantic conventions (https://opentelemetry.io/docs/specs/semconv/gen-ai/).

Do / Avoid

Do keep prompts small and modular; centralize shared fragments (policies, schemas, style).
Do add a prompt eval harness and block merges on regressions.
Do prefer "brief justification" over requesting chain-of-thought; treat hidden reasoning as model-internal.

Avoid

Avoid prompt sprawl (many near-duplicates with no owner or tests).
Avoid brittle multi-step chains without intermediate validation.
Avoid mixing policy and product copy in the same prompt (harder to audit and update).

Navigation: Core Patterns

Core Patterns - 7 production-grade prompt patterns
Structured Output (JSON), Deterministic Extractor, RAG Workflow
Hidden Chain-of-Thought, Tool/Agent Planner, Rewrite + Constrain, Decision Tree
Each pattern includes structure template and validation checklist

Navigation: Best Practices

Best Practices (Core) - Foundation rules for production-grade prompts

System instruction design, output contract specification, action directives
Context handling, error recovery, positive framing, style matching, style-adversarial red teaming
Anti-patterns, Claude 4+ specific optimizations

Production Guidelines - Deployment and operational guidance

Evaluation & testing (Prompt CI/CD), model parameters, few-shot selection
Safety & guardrails, conversation memory, context compaction resilience
Answer engineering, decomposition, multilingual/multimodal, benchmarking
CI/CD Tools (2026): Promptfoo, DeepEval integration patterns
Security (2026): PromptGuard 4-layer defense, Microsoft Prompt Shields, taint tracking

Quality Checklists - Validation checklists before deployment

Prompt QA, JSON validation, agent workflow checks
RAG workflow, safety & security, performance optimization
Testing coverage, anti-patterns, quality score rubric

Domain-Specific Patterns - Claude 4+ optimized patterns for specialized domains

Frontend/visual code: Creativity encouragement, design variations, micro-interactions
Research tasks: Success criteria, verification, hypothesis tracking
Agentic coding: No speculation rule, principled implementation, investigation patterns
Cross-domain best practices and quality modifiers

Navigation: Specialized Patterns

RAG Patterns - Retrieval-augmented generation workflows

Context grounding, chunk citation, missing information handling

Agent and Tool Patterns - Tool use and agent orchestration

Plan-then-act workflows, tool calling, multi-step reasoning, generate-verify-revise chains
Multi-Agent Orchestration (2026): centralized, handoff, federated patterns; plan-and-execute (90% cost reduction)

Extraction Patterns - Deterministic field extraction

Schema-based extraction, null handling, no hallucinations

Reasoning Patterns (Hidden CoT) - Internal reasoning without visible output

Hidden reasoning, final answer only, classification workflows
Extended Thinking API (Claude 4+): budget management, think tool, multishot patterns

Additional Patterns - Extended prompt engineering techniques

Advanced patterns, edge cases, optimization strategies

Prompt Testing & CI/CD - Automated prompt evaluation pipelines

Promptfoo, DeepEval integration, regression detection, A/B testing, quality gates

Multimodal Prompt Patterns - Vision, audio, and document input patterns

Image description, OCR+LLM, bounding box prompts, Whisper conditioning, video frame analysis

Prompt Security & Defense - Securing LLM applications against adversarial attacks

Injection detection (PromptGuard, Prompt Shields), defense-in-depth, taint tracking, red team testing

Navigation: Templates

Templates are copy-paste ready and organized by complexity:

Quick Templates

Quick Template - Fast, minimal prompt structure

Standard Templates

Standard Template - Production-grade operational prompt
Agent Template - Tool-using agent with planning
RAG Template - Retrieval-augmented generation
Chain-of-Thought Template - Hidden reasoning pattern
JSON Extractor Template - Deterministic field extraction
Prompt Evaluation Template - Regression tests, A/B testing, rollout gates

External Resources

External references are listed in data/sources.json:

Official documentation (OpenAI, Anthropic, Google)
LLM frameworks (LangChain, LlamaIndex)
Vector databases (Pinecone, Weaviate, FAISS)
Evaluation tools (OpenAI Evals, HELM)
Safety guides and standards
RAG and retrieval resources

Freshness Rule (2026)

When asked for “latest” prompting recommendations, prefer provider docs and standards from data/sources.json . If web search is unavailable, state the constraint and avoid overconfident “current best” claims.

Related Skills

This skill provides foundational prompt engineering patterns. For specialized implementations:

AI/LLM Skills:

AI Agents Development - Production agent patterns, MCP integration, orchestration
AI LLM Engineering - LLM application architecture and deployment
AI LLM RAG Engineering - Advanced RAG pipelines and chunking strategies
AI LLM Search & Retrieval - Search optimization, hybrid retrieval, reranking
AI LLM Development - Fine-tuning, evaluation, dataset creation

Software Development Skills:

Software Architecture Design - System design patterns
Software Backend - Backend implementation
Foundation API Design - API design and contracts

Usage Notes

For Claude Code:

Reference this skill when building prompts for agents, commands, or integrations
Use Quick Reference table for fast pattern lookup
Follow Decision Tree to select appropriate pattern
Validate outputs with Quality Checklists before deployment
Use templates as starting points, customize for specific use cases

For Codex CLI:

Use the same patterns and templates; adapt tool-use wording to the local tool interface
For long-horizon tasks, track progress explicitly (a step list/plan) and update it as work completes
Run independent reads/searches in parallel when the environment supports it; keep writes/edits serialized
AGENTS.md Integration: Place project-specific prompt guidance in AGENTS.md files at global (~/.codex/AGENTS.md), project-level (./AGENTS.md), or subdirectory scope for layered instructions
Reasoning Effort: Use medium for interactive coding (default), high /xhigh for complex autonomous multi-hour tasks

Fact-Checking

Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
Prefer primary sources; report source links and dates for volatile information.
If web access is unavailable, state the limitation and mark guidance as unverified.

ai-prompt-engineering

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

product-management

marketing-visual-design

startup-idea-validation

software-architecture-design