Skill Create Flow
Build a new skill using a repeatable, high-signal workflow that separates methodology discovery from packaging**—without requiring any other skills to be installed or invoked.
Best for
This workflow excels at creating procedural agent skills where the value comes from following a structured methodology rather than raw knowledge retrieval.
Ideal skill types:
-
Methodology-based: Debugging, code reviews, writing specs, systematic problem-solving
-
Decision-heavy: Architecture decisions, UX design, business strategy, negotiation
-
Multi-step workflows: Defined sequence of operations with clear deliverables
-
Quality-focused: Where "excellent vs mediocre" output is distinguishable
Less ideal for:
-
Pure reference/lookup skills (use knowledge retrieval instead)
-
Simple one-shot commands (use direct tool invocation instead)
-
Skills requiring external dependencies (this flow produces standalone skills)
Scope & Outputs
You are producing a new skill folder (with its own SKILL.md and optional bundled resources). This flow helps you:
-
Narrow the intent until “excellent vs mediocre” is judgeable.
-
Extract expert frameworks (when needed).
-
Generate a production-ready, testable skill spec and artifacts.
Target artifacts (recommended)
-
SKILL.md
-
examples/ (1 tiny example)
-
tests/ (3–5 eval prompts or scenarios)
-
index_entry.json (if you maintain a skill index)
-
CHANGELOG.md (optional but helpful for iteration)
Decision Rules (Pick the Track)
Track A — Non-technical / judgment-heavy (default if unsure)
Examples: writing, sales, hiring, product decisions, strategy, negotiation.
Use the full flow below (steps 1–6).
Track B — Technical / objective correctness dominates
Examples: code generation patterns, CLI automation, infrastructure scripts, data pipelines.
Usually shorten Step 2 (“expert frameworks”) and spend more time on Step 5 (“validation”).
Workflow (Standalone)
Step 1 — Lock the intent (narrowing)
Converge from broad → specific using this 5-layer funnel:
-
Domain: what domain is the skill for?
-
Context (5W1H): who/what/where/when/why/how constraints.
-
Comparative choice: pick the closest of 2–3 similar scenarios.
-
Boundaries (via negativa): confirm what is explicitly excluded.
-
Concrete anchor: one real recent case with inputs + desired output.
Deliverable from Step 1
-
A 5–10 line “skill brief”:
-
target user + context
-
inputs the user will provide
-
outputs the skill must produce
-
what’s explicitly out of scope
-
quality bar (“excellent output looks like…”)
Step 2 — Extract expert frameworks (masters-level)
Produce:
-
1–3 core frameworks/checklists experts use in this exact scenario.
-
Common failure modes + counterexamples.
-
Minimal decision rules (when to do A vs B; when to stop; when to ask for more info).
Stop condition
- You can explain the method in a way that is: specific, testable, and not generic advice.
Step 3 — Draft the trigger contract (frontmatter)
Write a frontmatter description that triggers reliably:
-
Include 3–6 concrete “when to use” phrases users will actually type.
-
Include exclusions if you keep getting false triggers.
-
Avoid “does everything”.
Template
name: <skill-name> description: <What it does>. Use when <trigger 1>, <trigger 2>, or <trigger 3>. Avoid when <non-goal>.
Step 4 — Write SKILL.md (lean + procedural)
Keep SKILL.md short and procedural:
-
Decision rules first.
-
Output contract (what files/sections are produced) is explicit.
-
Put long templates into references/ if they exceed ~100 lines.
Minimum recommended sections:
-
Scope & boundaries
-
Inputs (what the user must provide)
-
Procedure (step-by-step)
-
Decision rules & failure handling
-
Output contract
-
Test prompts (what you will validate with)
Step 5 — Create supporting artifacts (examples/evals/index/changelog)
A) Tiny example (examples/<name>-example.txt )
- ≤30 lines; show the most common invocation and expected output shape.
B) Evals / test prompts (tests/evals_<name>.yaml ) Include 3–5 scenarios:
-
happy path
-
missing input
-
edge case
-
out-of-scope
-
safety/constraint case (if relevant)
Minimal template:
cases:
- name: happy_path
prompt: |
<a realistic user request that should trigger the skill>
expect:
- <must-have output property 1>
- <must-have output property 2>
C) Index entry (index_entry.json ) (only if you keep an index)
{ "slug": "<skill-name>", "name": "<Human title>", "summary": "<=160 chars>", "entry": "skills/<skill-name>/SKILL.md" }
D) CHANGELOG.md (optional) Use it if you expect iteration; otherwise skip.
Step 6 — Validate via simulations
Dry-run the new skill against your test prompts:
-
Does it trigger when it should?
-
Does it avoid false triggers?
-
Does it produce the promised output contract?
If it fails, iterate:
-
tighten frontmatter description
-
move long content into references
-
add/adjust decision rules
Prompt Templates
Template 1 — Step 1 (narrowing)
“I want a skill for: . Help me narrow it using this 5-layer funnel: domain → 5W1H → pick closest scenario → boundaries → one real recent case. Ask only the minimum questions (1–3 at a time).”
Template 2 — Step 2 (expert frameworks)
“For the exact scenario we narrowed to, give me 1–3 expert-level frameworks/checklists, decision rules, and common failure modes. Keep it specific and testable (no generic advice).”
Template 3 — Step 4/5 (write artifacts)
“Using the skill brief + frameworks, draft:
-
SKILL.md (lean + procedural),
-
1 tiny example,
-
3–5 eval prompts,
-
optional index entry + changelog. Keep templates short; push long templates into references/.”
Notes (Keep It Lean)
-
Prefer decision rules over long explanation.
-
Prefer one tiny example over many medium ones.
-
If you feel tempted to add a long “how skills work” section, don’t—keep the flow operational.