Agent Skills
Guide for creating modular, self-contained Agent Skills that extend Claude's capabilities with specialized knowledge.
What Are Agent Skills?
Agent Skills are organized directories containing instructions, scripts, and resources that Claude can dynamically discover and load. They enable a single general-purpose agent to gain domain-specific expertise without requiring separate custom agents for each use case.
Key Concepts
-
Modularity: Self-contained packages that can be mixed and matched
-
Reusability: Share and distribute expertise across projects and teams
-
Progressive Disclosure: Load context only when needed, keeping interactions efficient
-
Specialization: Deep domain knowledge without sacrificing generality
Skill Categories
Skills fall into two categories (source: Anthropic PDF Guide):
Capability Uplift: Enhances Claude's core abilities (coding, analysis, reasoning). These are stable across model versions because they build on general capabilities. Example: a code review skill that adds structured review steps.
Encoded Preference: Encodes user-specific workflows, formatting, and conventions. These may need updates when models change because they depend on model behavior for fidelity. Example: a commit message skill that enforces team-specific format.
When creating a skill, identify its category — this determines testing strategy and maintenance expectations.
How Skills Work
Skills operate on a principle of progressive disclosure across multiple levels:
Level 1: Discovery
Agent system prompts include only skill names and descriptions, allowing Claude to decide when each skill is relevant based on the task at hand.
Level 2: Activation
When Claude determines a skill applies, it loads the full SKILL.md file into context, gaining access to the complete procedural knowledge and guidelines.
Level 3+: Deep Context
Additional bundled files (like references, forms, or documentation) load only when needed for specific scenarios, keeping token usage efficient.
This tiered approach maintains efficient context windows while supporting potentially unbounded skill complexity.
Skill Structure
Minimal Requirements
Every skill must have:
skill-name/ └── SKILL.md
Complete Structure
More complex skills can include additional resources:
skill-name/ ├── SKILL.md # Required: Core skill definition ├── scripts/ # Optional: Executable code for deterministic tasks ├── references/ # Optional: Documentation loaded on-demand └── assets/ # Optional: Templates, images, boilerplate
SKILL.md Format
Each SKILL.md file must begin with YAML frontmatter followed by Markdown content:
name: skill-name description: Concise explanation of when Claude should use this skill license: MIT
Skill Name
Main instructional content goes here...
Required YAML Properties
-
name : Hyphen-case identifier matching directory name (lowercase alphanumeric and hyphens only, max 64 characters) Maximum 64 characters Must contain only lowercase letters, numbers, and hyphens Cannot contain XML tags Cannot contain reserved words: "anthropic", "claude"
-
description : Explains the skill's purpose and when Claude should utilize it Must be non-empty Maximum 1024 characters Cannot contain XML tags The description should include both what the Skill does and when Claude should use it. For complete authoring guidance, see the best practices guide.
Description Constraints (from Anthropic best-practices):
-
Maximum 1024 characters
-
Must use third person (not "I can help you" or "You can use this")
-
Must include both what it does AND when to use it
-
Use pattern: [What it does]. Use when [trigger conditions].
Critical: The description is the ONLY text Claude sees during skill discovery (Level 1). The body's "When to Use" section only loads AFTER activation (Level 2) and cannot trigger it. All activation triggers must be in the description.
Optional YAML Properties
-
license : License name or filename reference
-
allowed-tools : Pre-approved tools list (Claude Code support only)
-
metadata : Key-value string pairs for client-specific properties
Markdown Body
The content section has no restrictions and should contain:
-
When to activate the skill
-
Core procedural knowledge
-
Best practices and guidelines
-
Examples and patterns
-
References to additional resources (if any)
Creating Skills: Seven-Step Workflow
- Understanding Through Examples
Gather concrete use cases to clarify what the skill should support. Real-world examples reveal actual needs better than theoretical requirements.
Example:
Use Case: Help developers follow Git best practices Examples:
- Creating conventional commit messages
- Rebasing feature branches
- Resolving merge conflicts
- Creating descriptive branch names
- Planning Resources
Analyze examples to identify needed components:
-
Scripts: For tasks requiring deterministic reliability or that would need repeated rewriting
-
References: Documentation to load into context as needed
-
Assets: Output files like templates or boilerplate (not loaded into context)
Example:
Git skill resources:
- scripts/analyze-commit.sh - Parse git diff for commit message
- references/conventional-commits.md - Detailed commit format spec
- assets/gitignore-templates/ - Common .gitignore files
- Initialization
Create the skill directory structure with the required SKILL.md file. Ensure the directory name matches the name property exactly.
mkdir -p my-skill/{scripts,references,assets} touch my-skill/SKILL.md
- Editing
Develop resource files and update SKILL.md with:
-
Purpose and activation criteria
-
Usage guidelines and best practices
-
Implementation details and examples
-
References to supplementary files
Use imperative/infinitive form rather than second-person instruction for clarity.
Keep core procedural information in SKILL.md and detailed reference material in separate files.
- Documentation
Document all sources in the plugin's sources.md . For each skill created, record:
-
URLs of documentation, guides, and references used
-
Purpose of each source
-
Key topics and concepts extracted
-
Date accessed (if relevant)
This maintains traceability and helps others understand the skill's foundation.
- Validation
Test the skill using the validation loop pattern:
-
Define success criteria (what correct activation and output look like)
-
Create eval prompts — both in-scope (should activate) and out-of-scope (should not)
-
Run evaluations and record pass/fail rates
-
Verify progressive disclosure works (references load when needed)
-
Check token usage remains efficient
-
If any validation fails, iterate on the skill before publishing
For the complete evaluation methodology, see references/evaluation-guide.md .
- Iteration
Refine based on real-world usage and evaluation data:
-
Optimize descriptions: Reduce false positives (too broad) and false negatives (too narrow)
-
Test across models: Verify behavior on Haiku, Sonnet, and Opus
-
Monitor activation: Track when the skill triggers correctly vs incorrectly
-
Deprecation signal: If the base model passes evals without the skill loaded, the skill may no longer be needed
For description optimization techniques, see references/evaluation-guide.md .
Best Practices
Evaluation-Driven Development
Build skills using an evaluation-first approach (source: Anthropic Blog Post):
-
Write evals first: Define test prompts and expected behaviors before writing skill content
-
Test with and without: Compare Claude's output with the skill loaded vs without it
-
Measure, don't guess: Track pass rates, token usage, and timing — not subjective quality
-
Run A/B comparisons: Use independent agents to compare skill versions blindly
-
Detect obsolescence: When the base model passes evals without the skill, consider deprecation
For the complete methodology, see references/evaluation-guide.md . For a copyable checklist, see templates/evaluation-checklist.md .
Degree of Freedom
Balance specificity against fragility in skill instructions (source: Anthropic PDF Guide):
-
Specify constraints, not implementations: "Ensure commit messages follow conventional format" not "Run git commit -m with prefix type(scope):"
-
Allow model adaptation: Instructions should work across Haiku, Sonnet, and Opus without modification
-
Test fragility: If a minor model update breaks your skill, instructions are too rigid
-
Test looseness: If Claude produces inconsistent results, instructions are too loose
For the full framework with examples, see references/design-patterns.md .
Context Window Discipline
The context window is a shared resource (source: Anthropic PDF Guide):
-
Keep SKILL.md under 500 lines — larger files degrade performance in smaller context windows
-
Move detailed content to references/ and load only when needed
-
Monitor cumulative load: skill + prompt + conversation history must all fit
-
Every line in SKILL.md is loaded on every activation — justify each line's presence
Structure for Scale
Split unwieldy SKILL.md files into separate referenced documents:
-
Keep commonly-used contexts together
-
Separate mutually exclusive information to reduce token usage
-
Use progressive disclosure to load details only when needed
-
Reference depth: Keep references one level deep only (SKILL.md → reference, not reference → reference)
-
TOC in long references: Add a Table of Contents to reference files over 100 lines
-
Scripts: Execute scripts for deterministic tasks; read scripts for patterns to adapt contextually
For design patterns and detailed guidance, see references/design-patterns.md .
Claude A/B Testing
Compare skill effectiveness using blind evaluation (source: Anthropic Blog Post):
-
Run the same prompt through Agent A (with skill) and Agent B (without skill)
-
Each agent uses a clean context — no accumulated state between tests
-
A comparator agent judges outputs without knowing which is which
-
Track token usage, timing, and quality metrics independently
-
Run 10+ evals for statistical significance
For detailed setup instructions, see references/evaluation-guide.md .
Consider Claude's Perspective
The skill name and description heavily influence when Claude activates it. Pay particular attention to:
-
Name: Should be clear and reflect the domain (e.g., git-operations , elixir-phoenix )
-
Description: Should specify both what the skill does and when to use it
Critical: The description is the ONLY text Claude sees during skill discovery (Level 1). The body's "When to Use" section only loads AFTER activation (Level 2) and cannot trigger it. All activation triggers must be in the description using patterns like "Use when [scenarios]".
Description optimization (source: Anthropic Blog Post):
-
False positives: Description too broad — add domain-specific terms
-
False negatives: Description too narrow — add synonyms and trigger scenarios
-
Target: 90%+ true positive rate, <5% false positive rate
-
Test with 10+ in-scope prompts and 5+ out-of-scope prompts
Monitor real usage patterns and iterate based on actual behavior.
Platform Constraints
Skills may run in different environments with different capabilities (source: Anthropic PDF Guide):
Platform Script Execution Network Filesystem
Claude Code (CLI) Full Bash access Available Full access
Claude.ai (Web) Sandbox only Limited Limited
API Tool-dependent Tool-dependent Tool-dependent
Mobile None None Read-only
Document which platform features each skill requires. Never assume external API availability.
Iterate Collaboratively
Work with Claude to capture successful approaches and common mistakes into reusable skill components. Ask Claude to self-reflect on what contextual information actually matters.
Write for AI Consumption
Use clear, imperative language that Claude can follow:
-
"Follow the Conventional Commits specification"
-
"Use descriptive branch names with type prefixes"
-
"Run tests before committing"
Avoid hedging language like "You should try to" or "It might be good to" or "Consider following".
Include concrete examples wherever possible to illustrate patterns and approaches.
Security Considerations
Install skills only from trusted sources. When evaluating unfamiliar skills:
-
Thoroughly audit bundled files and scripts
-
Review code dependencies
-
Examine instructions directing Claude to connect with external services
-
Verify the skill doesn't request sensitive information or dangerous operations
Anti-Fabrication Requirements
All skills MUST adhere to strict anti-fabrication requirements to ensure factual, measurable content. Every SKILL.md must include anti-fabrication rules — either inline (template below) or by referencing core:anti-fabrication .
For skill-creation-specific anti-fabrication guidance, see references/anti-fabrication.md . For the authoritative anti-fabrication guide, see the core:anti-fabrication skill.
Core Principles
-
Base all outputs on actual analysis of real data using tool execution
-
Execute Read, Glob, Bash, or other validation tools before making claims
-
Mark uncertain information as "requires analysis", "needs validation", or "requires investigation"
-
Use precise, factual language without superlatives or unsubstantiated performance claims
-
Execute tests before marking tasks complete and report actual results
-
Validate integration recommendations through actual framework detection using tool analysis
Prohibited Language and Claims
-
Superlatives: Avoid "excellent", "comprehensive", "advanced", "optimal", "perfect"
-
Unsubstantiated Metrics: Never fabricate percentages, success rates, or performance numbers
-
Assumed Capabilities: Don't claim features exist without tool verification
-
Generic Claims: Replace vague statements with specific, measurable observations
-
Fabricated Testing: Never report test results without actual execution
Time and Effort Estimation Rule
-
Never provide time estimates, effort estimates, or completion timelines without actual measurement or analysis
-
If estimates are requested, execute tools to analyze scope (e.g., count files, measure complexity, assess dependencies) before providing data-backed estimates
-
When estimates cannot be measured, explicitly state "timeline requires analysis of [specific factors]"
-
Avoid fabricated scheduling language like "15 minutes", "2 hours", "quick task" without factual basis
Validation Requirements
-
File Claims: Use Read or Glob tools before claiming files exist or contain specific content
-
System Integration: Use Bash or appropriate tools to verify system capabilities
-
Framework Detection: Execute actual detection logic before claiming framework presence
-
Test Results: Only report test outcomes after actual execution with tool verification
-
Performance Claims: Base any performance statements on actual measurement or analysis
Skill Examples
For annotated examples of simple and complex skills with category classifications, see references/examples.md .
Common Pitfalls
For common mistakes and how to avoid them, see references/examples.md .
References
claude-skills/ ├── references/ │ ├── design-patterns.md # Degree of freedom, validation loops, conditional workflows │ ├── evaluation-guide.md # Eval-driven development, A/B testing, multi-model testing │ ├── anti-fabrication.md # Skill-creation-specific anti-fab guidance │ └── examples.md # Annotated skill examples and common pitfalls └── templates/ ├── evaluation-checklist.md # Copyable eval checklist ├── level1.md # Example skill metadata ├── level2.md # Example skill body ├── level3.md # Example skill folder structure └── skill.md # Example basic skill
For more information:
-
Agent Skills Blog: https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
-
Building Skills Guide (PDF): https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf
-
Improving Skill Creator Blog: https://claude.com/blog/improving-skill-creator-test-measure-and-refine-agent-skills
-
Example Skills: https://github.com/anthropics/skills
-
Skills Cookbook: https://github.com/anthropics/claude-cookbooks/tree/main/skills
-
Skill Creator Guide: https://github.com/anthropics/skills/blob/main/skill-creator/SKILL.md
-
Agent Skills Specification: https://github.com/anthropics/skills/blob/main/agent_skills_spec.md