context-engineering

Optimize Claude Code context usage through monitoring, reduction strategies, progressive disclosure, planning/execution separation, and file-based optimization. Task-based operations for context window management, token efficiency, and maintaining conversation quality. Use when managing token costs, optimizing context usage, preventing context overflow, or improving multi-turn conversation quality.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "context-engineering" with this command: npx skills add adaptationio/skrillz/adaptationio-skrillz-context-engineering

Context Engineering

Overview

context-engineering provides systematic strategies for optimizing Claude Code context window usage. It helps you monitor token consumption, reduce context load, design context-efficient skills, and apply proven optimization patterns.

Purpose: Maximize Claude Code effectiveness while managing token costs and maintaining conversation quality

The 5 Context Optimization Operations:

  1. Monitor Context Usage - Track token consumption, identify heavy consumers
  2. Reduce Context Load - Remove stale content, minimize loaded files
  3. Optimize Skill Design - Progressive disclosure, efficient reference loading
  4. Separate Planning/Execution - Keep execution context clean
  5. File-Based Strategies - Externalize large data (95% token savings)

Key Benefits:

  • 29-39% performance improvement with context editing strategies
  • 95% token savings using file-based approaches for large data
  • Sustained quality in multi-turn conversations
  • Cost reduction through efficient token usage
  • Prevention of context overflow and conversation drift

Context Window Sizes (2025):

  • Sonnet 4/4.5: 200k tokens (standard), 500k-1M (beta for Tier 4)
  • Auto-compaction: Triggers around 80% usage (~160k for 200k window)

When to Use

Use context-engineering when:

  1. Approaching Context Limits - Context usage >60-70%, need to optimize before hitting limits
  2. Building Large Skills - Creating skills with extensive documentation, need efficient loading strategies
  3. Token Cost Management - Reducing API costs through optimization
  4. Multi-Turn Conversations - Maintaining coherence across extended sessions
  5. Skill Design Phase - Planning context-efficient architecture from start
  6. Performance Optimization - Improving response quality and latency
  7. Conversation Quality - Preventing drift and maintaining focus
  8. MCP Integration - Managing context from Model Context Protocol servers
  9. Large Data Handling - Working with extensive datasets or outputs

Prerequisites

  • Understanding of Claude context windows
  • Access to context monitoring (Claude Code /context command)
  • Familiarity with progressive disclosure pattern
  • Skills under development or optimization

Operations

Operation 1: Monitor Context Usage

Purpose: Track token consumption, identify context-heavy elements, and detect optimization opportunities

When to Use This Operation:

  • Beginning of optimization effort (baseline measurement)
  • During development (continuous monitoring)
  • When approaching context limits (>60-70% usage)
  • Investigating performance issues

Process:

  1. Check Current Context Usage

    Use /context command in Claude Code to view:
    - Total tokens used
    - Percentage of context window
    - Files loaded
    - Recent tool calls
    
  2. Identify Heavy Consumers

    • Large files loaded (>5,000 tokens each)
    • Extensive conversation history
    • Many tool call results
    • Large CLAUDE.md files
  3. Analyze Usage Patterns

    • Which files are loaded but rarely referenced?
    • Are all tool results still relevant?
    • Is conversation history necessary?
    • Are there duplicate or redundant contexts?
  4. Document Baseline

    • Record current token usage
    • Note context-heavy elements
    • Identify optimization targets
  5. Set Optimization Goals

    • Target token reduction (e.g., reduce by 20%)
    • Performance improvement targets
    • Quality maintenance requirements

Validation Checklist:

  • Current context usage measured (tokens and percentage)
  • Context-heavy elements identified (files, history, tool results)
  • Baseline documented for comparison
  • Optimization targets set
  • High-impact optimization opportunities noted

Outputs:

  • Current context usage metrics
  • List of context-heavy elements
  • Baseline measurement
  • Optimization targets
  • Priority optimization opportunities

Time Estimate: 10-15 minutes

Example:

Context Usage Analysis
======================
Current Usage: 145,000 tokens (72% of 200k window)

Heavy Consumers:
1. CLAUDE.md: 25,000 tokens (17%)
2. Large skill files: 40,000 tokens (28%)
   - planning-architect/SKILL.md: 15,000 tokens
   - development-workflow/common-patterns.md: 12,000 tokens
   - review-multi/scoring-rubric.md: 8,000 tokens
3. Conversation history: 30,000 tokens (21%)
4. Tool call results: 20,000 tokens (14%)

Optimization Opportunities:
- Split large CLAUDE.md (25k → 10k target)
- Use references/ loading instead of full files (40k → 15k)
- Clear old tool results (20k → 5k)

Target: Reduce to ~100k tokens (50% of window, 31% reduction)

Operation 2: Reduce Context Load

Purpose: Remove stale content, minimize loaded files, and reduce token consumption

When to Use This Operation:

  • Context usage >70% (approaching limits)
  • Performance degradation noticed
  • Before major operations (clear space)
  • Periodic maintenance (every few hours)

Process:

  1. Remove Stale Tool Results

    • Identify old tool call results no longer needed
    • Tool results from exploratory work
    • Superseded information
  2. Minimize File Loading

    • Only load files actively needed
    • Use Grep instead of Read for searching
    • Load specific sections, not entire files
    • Unload files when done with them
  3. Optimize Conversation History

    • Context editing auto-clears stale content
    • Summarize long conversations if needed
    • Start fresh session for new major tasks
  4. Reduce CLAUDE.md Size

    • Keep only essential long-term instructions
    • Move project-specific details to separate files
    • Use CLAUDE.local.md for temporary preferences
    • Target: <5,000 tokens for CLAUDE.md
  5. Apply Progressive Loading

    • Load overview/index files first
    • Load detailed references only when needed
    • Use skill references/ on-demand loading

Validation Checklist:

  • Stale tool results cleared
  • Only necessary files loaded
  • CLAUDE.md size optimized (<5,000 tokens if possible)
  • Progressive loading applied where relevant
  • Context usage reduced measurably

Outputs:

  • Reduced token count
  • Cleaner context window
  • List of removed/optimized elements
  • New context usage measurement

Time Estimate: 15-30 minutes

Example Reduction:

Before Optimization: 145,000 tokens (72%)

Actions Taken:
1. Cleared 50 old tool results: -15,000 tokens
2. Unloaded 3 large files no longer needed: -18,000 tokens
3. Optimized CLAUDE.md (split to CLAUDE.local.md): -12,000 tokens
4. Used references/ loading instead of full files: -25,000 tokens

After Optimization: 75,000 tokens (37%)

Reduction: 70,000 tokens (48% reduction)
Quality Impact: None - relevant context maintained

Operation 3: Optimize Skill Design

Purpose: Design context-efficient skills using progressive disclosure, lazy loading, and token-aware architecture

When to Use This Operation:

  • Planning new skills (design for efficiency from start)
  • Refactoring existing skills (improve context efficiency)
  • Building large/complex skills (manage context proactively)
  • Creating skill ecosystems (coordinate context usage)

Process:

  1. Apply Progressive Disclosure

    • SKILL.md: Overview + essentials only (<1,200 lines, ~3,000-5,000 tokens)
    • references/: Detailed guides loaded on-demand (300-600 lines each, ~1,000-2,000 tokens)
    • scripts/: Automation loaded when needed

    Token Impact: 70-80% reduction vs monolithic (5k vs 20k+ tokens)

  2. Design for Lazy Loading

    • Separate content into focused reference files
    • Each reference file covers one topic
    • Load specific reference, not entire skill
    • Example: Load references/structure-review-guide.md not entire review-multi
  3. Optimize File Sizes

    • SKILL.md target: 800-1,200 lines (2,500-4,000 tokens)
    • Reference files: 300-600 lines (1,000-2,000 tokens each)
    • Keep files focused and concise
  4. Use Token-Efficient Formats

    • Tables instead of prose (higher information density)
    • Lists instead of paragraphs (more scannable)
    • Code blocks for examples (clear and concise)
    • Quick Reference sections (high-density lookup)
  5. Consider Context Budget

    • Estimate token usage for skill
    • Simple skill: 3,000-8,000 tokens total
    • Medium skill: 10,000-20,000 tokens total
    • Complex skill: 25,000-40,000 tokens total
    • Design within budget

Validation Checklist:

  • Progressive disclosure applied (SKILL.md + references/)
  • SKILL.md <1,200 lines (~4,000 tokens or less)
  • Reference files 300-600 lines each
  • Files focused on single topics (lazy loadable)
  • Token-efficient formats used (tables, lists, code blocks)
  • Estimated total tokens within budget
  • Skill can be partially loaded (references/ on-demand)

Outputs:

  • Context-efficient skill design
  • Token budget estimate
  • Progressive disclosure plan
  • Reference file organization

Time Estimate: 20-40 minutes (during planning phase)

Example:

Skill Design: api-integration

Token Budget Analysis:
- SKILL.md: 900 lines → ~3,000 tokens
- references/ (3 files):
  - api-guide.md: 400 lines → ~1,300 tokens
  - auth-patterns.md: 350 lines → ~1,200 tokens
  - examples.md: 300 lines → ~1,000 tokens
- README.md: 300 lines → ~1,000 tokens

Total if all loaded: ~7,500 tokens
Typical usage: SKILL.md only → 3,000 tokens (60% savings)
With 1 reference: 3,000 + 1,200 → 4,200 tokens (44% savings)

Progressive Disclosure Impact:
- Monolithic (all in SKILL.md): ~7,500 tokens always loaded
- Progressive (SKILL.md + on-demand refs): 3,000-4,500 tokens typical
- Token Savings: 40-60% depending on usage

Design: ✅ Context-efficient with progressive disclosure

Operation 4: Separate Planning from Execution

Purpose: Keep execution context clean by separating exploratory planning from focused implementation

When to Use This Operation:

  • Starting complex development work
  • Context getting cluttered with exploration
  • Need clean context for implementation
  • Before critical/focused work

Process:

  1. Planning Phase (Separate Session)

    • Broad codebase exploration
    • Research and pattern discovery
    • Architecture decisions
    • Task breakdown
    • Output: Plan documents, task lists

    Characteristics: High context usage, exploratory, broad

  2. Execution Phase (Fresh Session)

    • Load plan documents (not exploration history)
    • Focused implementation
    • Specific file operations
    • Minimal context bloat

    Characteristics: Clean context, focused, efficient

  3. Session Transition

    • End planning session when plan complete
    • Save planning artifacts (plans, task lists, decisions)
    • Start new session for execution
    • Load only: plan docs, CLAUDE.md, immediate dependencies
  4. Maintain Clean Execution Context

    • Don't re-explore during execution
    • Follow plan, don't re-research
    • Load files as needed, unload when done
    • Keep focus on implementation

Validation Checklist:

  • Planning and execution in separate sessions (when appropriate)
  • Planning artifacts saved and documented
  • Execution session starts with clean context
  • Only plan docs and essentials loaded for execution
  • No re-exploration during execution
  • Context stays focused on current task

Outputs:

  • Clean execution context
  • Focused implementation
  • Reduced context bloat
  • Better performance and quality

Time Estimate: Planning decision (0-5 min), session management (as needed)

Example:

Planning Session (Context: 150k tokens, 75% usage):
- Explored 20 files for research
- Analyzed patterns across codebase
- Made architecture decisions
- Created detailed plan
- Output: skill-plan.md (comprehensive)

[End session, save plan]

Execution Session (Context: 30k tokens, 15% usage):
- Load: CLAUDE.md + skill-plan.md + development-workflow
- Context: Clean, focused, 30k tokens
- Implementation: Follow plan, build skill
- Load references as needed (not all at once)

Result: 80% context reduction (150k → 30k)
Quality: Higher (clean context, focused work)

Research Finding: "Separating planning from execution keeps implementation context clean" - confirmed by 2025 best practices


Operation 5: File-Based Optimization

Purpose: Externalize large data to temporary files for on-demand analysis, achieving 95% token savings

When to Use This Operation:

  • Handling large data sets (>5,000 tokens)
  • Processing extensive outputs (logs, reports)
  • Working with large MCP responses
  • Managing generated content

Process:

  1. Identify Large Data

    • Data >5,000 tokens (typically >10,000 characters)
    • Repeated reference to same large content
    • Extensive generated outputs
    • Large MCP server responses
  2. Externalize to Files

    # Save large data to temp file
    echo "large data here" > /tmp/large-data.txt
    

    Instead of keeping in conversation context

  3. Reference File Instead of Content

    Large dataset saved to: /tmp/analysis-data.json (50,000 tokens)
    
    To analyze: Read /tmp/analysis-data.json when needed
    

    Token Impact: 50,000 tokens → ~500 tokens (95% reduction)

  4. Load On-Demand

    • Read file only when specific analysis needed
    • Process in chunks if necessary
    • Don't keep full content in context
  5. Clean Up Temp Files

    • Remove temp files when no longer needed
    • Don't accumulate unused files

Validation Checklist:

  • Large data identified (>5,000 tokens)
  • Data externalized to files
  • File paths documented (reference in conversation)
  • On-demand loading used (not full content in context)
  • Token savings measured (before/after)
  • Quality maintained (can access data when needed)
  • Temp files cleaned up when done

Outputs:

  • Externalized data files
  • File path references
  • Significant token savings (often 90-95%)
  • Maintained data accessibility

Time Estimate: 10-20 minutes (setup and management)

Example:

Scenario: Analyzing large log file (100,000 tokens)

Before Optimization:
- Full log in conversation context: 100,000 tokens
- Context usage: 50% just for log data

After Optimization:
1. Save log to /tmp/app-log.txt
2. Reference in context: "Log saved to /tmp/app-log.txt (100k tokens)"
3. Read specific sections when needed:
   - Read first 50 lines for overview
   - Grep for errors
   - Read relevant sections on-demand

Token Usage:
- Before: 100,000 tokens in context
- After: ~500 tokens (file reference) + ~2,000 tokens (specific reads)
- Savings: 97,500 tokens (97.5% reduction)

Quality: Maintained - can still analyze log on-demand
Access: Full log available when needed

Research Finding: "File-based approach achieves 95% token savings" - proven in 2025 optimization studies


Best Practices

1. Quality Over Quantity

Practice: Focus on relevant, high-quality context rather than loading everything

Rationale: Every piece should be current, accurate, and directly relevant to task

Application: Before loading file, ask: "Do I need this right now for current task?"

2. Progressive Disclosure Always

Practice: Design all skills with SKILL.md + references/ pattern

Rationale: 70-80% token reduction vs monolithic design

Application: SKILL.md <1,200 lines, details in references/ loaded on-demand

3. Monitor Regularly

Practice: Check context usage periodically, especially in long sessions

Rationale: Auto-compaction triggers at ~80%, but proactive monitoring prevents drift

Application: Check /context every 30-60 minutes in active development

4. File-Based for Large Data

Practice: Externalize data >5,000 tokens to files

Rationale: 95% token savings while maintaining accessibility

Application: Save to /tmp/, reference file path, load on-demand

5. Separate Planning from Execution

Practice: Plan in one session, execute in clean session with plan artifacts

Rationale: Keeps execution context focused, prevents exploratory noise

Application: When planning >1 hour, start fresh session for implementation

6. Clean Context Before Critical Work

Practice: Reduce context load before important/complex operations

Rationale: Clean context improves quality and performance

Application: Before complex implementation, clear unnecessary context

7. Use Appropriate Tools

Practice: Choose tools that minimize context usage

Rationale: Some tools add more context than others

Application:

  • Grep instead of Read for searching (doesn't load full file)
  • Glob for finding files (doesn't load content)
  • Task agents for exploration (separate context)

8. Optimize CLAUDE.md

Practice: Keep CLAUDE.md concise (<5,000 tokens), split if needed

Rationale: CLAUDE.md loaded every session, large files waste context

Application:

  • Essential standards in CLAUDE.md (project level)
  • Temporary preferences in CLAUDE.local.md
  • Specific domain knowledge in separate files loaded as needed

Context Budgets for Skills

Simple Skills (Total: 5,000-10,000 tokens)

Structure:

  • SKILL.md only or SKILL.md + 1-2 small references
  • No scripts or minimal automation
  • ~400-800 lines total

Token Breakdown:

  • SKILL.md: 3,000-4,000 tokens
  • References (if any): 1,000-2,000 tokens each
  • README: 1,000 tokens

Example: format-validator, simple helpers


Medium Skills (Total: 15,000-30,000 tokens)

Structure:

  • SKILL.md + 3-5 references + scripts
  • ~1,500-3,000 lines total

Token Breakdown:

  • SKILL.md: 3,000-5,000 tokens
  • References: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
  • Scripts: 2-3 files × 1,000 tokens = 2,000-3,000 tokens
  • README: 1,000-1,500 tokens

Progressive Loading: Load SKILL.md (3-5k) + specific reference when needed (+1.5k) = 4.5-6.5k typical

Example: prompt-builder, skill-researcher


Complex Skills (Total: 40,000-60,000 tokens)

Structure:

  • SKILL.md + 7-10 references + 4+ scripts
  • ~4,000-7,000 lines total

Token Breakdown:

  • SKILL.md: 4,000-6,000 tokens
  • References: 7-10 files × 2,000 tokens = 14,000-20,000 tokens
  • Scripts: 4-6 files × 1,500 tokens = 6,000-9,000 tokens
  • README: 1,500-2,000 tokens

Progressive Loading: Load SKILL.md (4-6k) + 1-2 references as needed (+2-4k) = 6-10k typical

Example: review-multi, testing-validator

Key: Even complex skills only load 6-10k tokens typically (not full 40-60k)


Common Mistakes

Mistake 1: Loading Everything Upfront

Symptom: Context quickly fills with all skill files

Cause: Reading all files instead of progressive loading

Fix: Load SKILL.md first, load references/ only when needed for specific operations

Prevention: Follow progressive disclosure pattern

Mistake 2: Not Monitoring Context

Symptom: Unexpected context overflow, performance degradation

Cause: No visibility into token usage

Fix: Check /context regularly, monitor usage patterns

Prevention: Check context every 30-60 min in active sessions

Mistake 3: Large Monolithic CLAUDE.md

Symptom: Every session starts with 20k-50k tokens used

Cause: Putting everything in CLAUDE.md

Fix: Split CLAUDE.md - essentials only, separate files for detailed knowledge

Prevention: Keep CLAUDE.md <5,000 tokens, use multiple files

Mistake 4: Keeping Stale Tool Results

Symptom: Context bloated with old exploratory results

Cause: Not clearing old tool calls

Fix: Context editing auto-clears, but can manually manage by starting fresh sessions

Prevention: Fresh session for major transitions (planning → execution)

Mistake 5: Not Using File-Based Strategies

Symptom: Large data sets consuming 30-50% of context

Cause: Keeping large outputs in conversation

Fix: Save to /tmp/ files, reference file path, load on-demand

Prevention: Any data >5,000 tokens → externalize to file

Mistake 6: Ignoring Progressive Disclosure

Symptom: Skills with 2,000+ line SKILL.md files

Cause: Not using references/ for detailed content

Fix: Extract detailed content to references/, keep SKILL.md as overview

Prevention: Design with progressive disclosure from start (use planning-architect)


Quick Reference

Context Window Limits (2025)

ModelStandardBeta (Tier 4)Auto-Compact
Sonnet 4/4.5200k tokens500k-1M tokens~80% (~160k for 200k)

Token Savings Strategies

StrategyToken SavingsApplication
Progressive Disclosure70-80%SKILL.md + references/ vs monolithic
File-Based Externalization95%Large data >5k tokens to /tmp/ files
Context Editing29-39%Auto-clears stale content
Lazy Loading60-70%Load references on-demand vs all upfront
Optimized CLAUDE.mdVariableKeep <5k tokens vs 20-50k bloat

Skill Token Budgets

Skill ComplexityTotal TokensTypical LoadProgressive Load
Simple5k-10k3k-4kSKILL.md only
Medium15k-30k4k-6kSKILL.md + 1 reference
Complex40k-60k6k-10kSKILL.md + 2 references

Context Usage Guidelines

Usage %StatusAction
<50%✅ HealthyNormal operation
50-70%⚠️ MonitorCheck periodically, plan optimization
70-80%⚠️ OptimizeReduce context load soon
>80%❌ CriticalImmediate optimization needed (auto-compact triggers)

Optimization Decision Tree

Is context >70%?
├─ Yes → Reduce immediately (Operation 2)
│   ├─ Clear stale tool results
│   ├─ Unload unnecessary files
│   └─ Start fresh session if needed
│
└─ No → Preventive optimization
    ├─ Is data >5k tokens? → File-based (Operation 5)
    ├─ Building skills? → Progressive disclosure (Operation 3)
    └─ Long session? → Consider planning/execution split (Operation 4)

Quick Optimization Actions

Immediate (When context >80%):

1. Check usage: /context command
2. Clear old tool results (context editing helps)
3. Start fresh session with essentials only
4. Load plan docs, not exploration history

Preventive (During development):

1. Design skills with progressive disclosure
2. Use file-based for large data (>5k tokens)
3. Monitor context every 30-60 min
4. Separate planning from execution (for complex work)

Common Commands

# Monitor context usage
/context

# Read specific lines (not full file)
Read file_path --limit 50

# Search without loading (uses Grep, more efficient)
Grep "pattern" path/

# Find files without loading content
Glob "*.py" path/

# Externalize large data
Bash: command > /tmp/output.txt
# Then reference: "See /tmp/output.txt for results"

Token Estimation

Quick Estimation:

  • 1 line of code/text ≈ 3-4 tokens (average)
  • 1,000 lines ≈ 3,000-4,000 tokens
  • Dense prose: 3-3.5 tokens/line
  • Code with comments: 2.5-3 tokens/line
  • Tables/lists: 2-2.5 tokens/line (more efficient)

For More Information

  • Context monitoring: references/context-monitoring-guide.md
  • Reduction strategies: references/reduction-strategies.md
  • Optimization patterns: references/optimization-patterns.md
  • Analysis script: scripts/analyze-context-usage.py

context-engineering helps you maximize Claude Code effectiveness through strategic context management, ensuring optimal performance, quality, and cost-efficiency throughout development.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

supabase-cli

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

codex-cli

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

task-development

No summary provided by upstream source.

Repository SourceNeeds Review