semgrep-rule-creator

AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "semgrep-rule-creator" with this command: npx skills add oimiragieo/agent-studio/oimiragieo-agent-studio-semgrep-rule-creator

Semgrep Rule Creator

Security Notice

AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:

  • Custom security rule development for owned codebases

  • Coding standard enforcement via automated checks

  • CI/CD security gate rule authoring

  • Vulnerability pattern codification for prevention

  • Educational purposes in controlled environments

NEVER use for:

  • Creating rules to bypass security controls

  • Scanning systems without authorization

  • Any illegal activities

Step 1: Define the Detection Goal

Before writing a rule, clearly define:

  • What to detect: The vulnerable or undesired code pattern

  • Why it matters: The security impact or quality concern

  • What languages: Which programming languages to target

  • True positive example: Code that SHOULD match

  • True negative example: Code that should NOT match (safe alternative)

  • False positive risks: What similar-looking code is actually safe

Detection Goal Template

Rule: [rule-id]

  • Detect: [description of what to find]
  • Why: [security impact / quality concern]
  • Languages: [javascript, typescript, python, etc.]
  • CWE: [CWE-XXX]
  • OWASP: [A0X category]
  • True Positive: [code example that should match]
  • True Negative: [safe code that should NOT match]

Step 2: Write the Semgrep Rule

Basic Rule Structure

rules:

  • id: rule-id-here message: > Clear description of what was found and why it matters. Include remediation guidance in the message. severity: ERROR # ERROR, WARNING, INFO languages: [javascript, typescript] metadata: cwe: - CWE-089 owasp: - A03:2021 confidence: HIGH # HIGH, MEDIUM, LOW impact: HIGH # HIGH, MEDIUM, LOW category: security subcategory: - vuln technology: - express - node.js references: - https://owasp.org/Top10/A03_2021-Injection/ source-rule-url: https://semgrep.dev/r/rule-id

    Pattern goes here (see below)

Pattern Types

Simple Pattern Match

pattern: | eval($X)

Pattern with Alternatives (OR)

pattern-either:

  • pattern: eval($X)
  • pattern: new Function($X)
  • pattern: setTimeout($X, ...)
  • pattern: setInterval($X, ...)

Pattern with Exclusions (AND NOT)

patterns:

  • pattern: $DB.query($QUERY)
  • pattern-not: $DB.query($QUERY, $PARAMS)
  • pattern-not: $DB.query($QUERY, [...])

Pattern Inside Context

patterns:

  • pattern: $RES.send($DATA)
  • pattern-inside: | app.$METHOD($PATH, function($REQ, $RES) { ... })
  • pattern-not-inside: | app.$METHOD($PATH, authenticate, function($REQ, $RES) { ... })

Metavariable Constraints

patterns:

  • pattern: crypto.createHash($ALGO)
  • metavariable-regex: metavariable: $ALGO regex: (md5|sha1|MD5|SHA1)
  • focus-metavariable: $ALGO

patterns:

  • pattern: setTimeout($FUNC, $TIME)
  • metavariable-comparison: metavariable: $TIME comparison: $TIME > 60000

Taint Mode Rules (Advanced)

For tracking data flow from sources to sinks:

mode: taint pattern-sources:

  • patterns:
    • pattern: $REQ.query.$PARAM
  • patterns:
    • pattern: $REQ.body.$PARAM
  • patterns:
    • pattern: $REQ.params.$PARAM pattern-sinks:
  • patterns:
    • pattern: $DB.query($SINK, ...)
    • focus-metavariable: $SINK pattern-sanitizers:
  • patterns:
    • pattern: escape($X)
  • patterns:
    • pattern: sanitize($X)
  • patterns:
    • pattern: $DB.query($QUERY, [...])

Step 3: Common Rule Templates

SQL Injection Detection

rules:

  • id: sql-injection-string-concat message: > Possible SQL injection via string concatenation. User input appears to be concatenated into a SQL query string. Use parameterized queries instead. severity: ERROR languages: [javascript, typescript] metadata: cwe: [CWE-089] owasp: [A03:2021] confidence: HIGH impact: HIGH category: security patterns:
    • pattern-either:
      • pattern: $DB.query("..." + $VAR + "...")
      • pattern: $DB.query(...${$VAR}...)
    • pattern-not: $DB.query("..." + $VAR + "...", [...]) fix: | $DB.query("... $1 ...", [$VAR])

XSS Detection

rules:

  • id: xss-innerhtml-assignment message: > Direct assignment to innerHTML with potentially untrusted data. Use textContent for text or a sanitization library for HTML. severity: ERROR languages: [javascript, typescript] metadata: cwe: [CWE-079] owasp: [A03:2021] confidence: MEDIUM impact: HIGH category: security pattern-either:
    • pattern: $EL.innerHTML = $DATA
    • pattern: document.getElementById($ID).innerHTML = $DATA

Hardcoded Secrets

rules:

  • id: hardcoded-api-key message: > Hardcoded API key detected. Store secrets in environment variables or a secrets manager. severity: ERROR languages: [javascript, typescript, python] metadata: cwe: [CWE-798] owasp: [A02:2021] confidence: MEDIUM impact: HIGH category: security pattern-either:
    • pattern: | $KEY = "AKIA..."
    • pattern: | $KEY = "sk-..."
    • pattern: | $KEY = "ghp_..." pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36})

Missing Authentication

rules:

  • id: express-route-missing-auth message: > Express route handler without authentication middleware. Add authentication middleware before the handler. severity: WARNING languages: [javascript, typescript] metadata: cwe: [CWE-306] owasp: [A07:2021] confidence: MEDIUM impact: HIGH category: security patterns:
    • pattern-either:
      • pattern: app.post($PATH, function($REQ, $RES) { ... })
      • pattern: app.put($PATH, function($REQ, $RES) { ... })
      • pattern: app.delete($PATH, function($REQ, $RES) { ... })
      • pattern: router.post($PATH, function($REQ, $RES) { ... })
      • pattern: router.put($PATH, function($REQ, $RES) { ... })
      • pattern: router.delete($PATH, function($REQ, $RES) { ... })
    • pattern-not-inside: | app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
    • pattern-not-inside: | router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })

Insecure Randomness

rules:

  • id: insecure-random-for-security message: > Math.random() is not cryptographically secure. Use crypto.getRandomValues() or crypto.randomBytes() for security-sensitive random values. severity: WARNING languages: [javascript, typescript] metadata: cwe: [CWE-330] confidence: MEDIUM impact: MEDIUM category: security patterns:
    • pattern: Math.random()
    • pattern-inside: | function $FUNC(...) { ... }
    • metavariable-regex: metavariable: $FUNC regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce)

Step 4: Write Rule Tests

Test File Format

Create a test file alongside the rule:

// ruleid: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ' + userId);

// ruleid: sql-injection-string-concat db.query(SELECT * FROM users WHERE id = ${userId});

// ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = $1', [userId]);

// ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ?', [userId]);

Running Tests

Test a single rule

semgrep --test --config=rules/sql-injection.yml tests/

Test all rules

semgrep --test --config=rules/ tests/

Validate rule syntax

semgrep --validate --config=rules/

Step 5: Rule Optimization

Performance Best Practices

  • Be specific with patterns: Avoid overly broad matches like $X($Y)

  • Use pattern-inside to scope: Narrow the search context

  • Use language-specific syntax: Leverage language features

  • Avoid deep ellipsis nesting: ... ... ... is slow

  • Use focus-metavariable: Narrow the reported location

  • Test with large codebases: Verify performance at scale

Reducing False Positives

  • Add pattern-not for safe patterns: Exclude known-safe alternatives

  • Use metavariable-regex: Constrain metavariable values

  • Use pattern-not-inside: Exclude safe contexts

  • Set appropriate confidence: Be honest about detection certainty

  • Add technology metadata: Help users filter relevant rules

  • Provide fix suggestions: When possible, include fix: field

Rule Validation Checklist

  • Rule has unique, descriptive ID

  • Message explains the issue AND remediation

  • Severity matches actual risk

  • Metadata includes CWE, OWASP, confidence, impact

  • At least 2 true positive test cases

  • At least 2 true negative test cases

  • Rule validated with semgrep --validate

  • Rule tested with semgrep --test

  • Performance acceptable on large codebase

  • Fix suggestion provided (if applicable)

Semgrep Pattern Syntax Reference

Syntax Meaning Example

$X

Single metavariable eval($X)

$...X

Multiple metavariable args func($...ARGS)

...

Ellipsis (any statements) if (...) { ... }

<... $X ...>

Deep expression match <... eval($X) ...>

pattern-either

OR operator Match any of N patterns

pattern-not

NOT operator Exclude specific patterns

pattern-inside

Context requirement Must be inside this pattern

pattern-not-inside

Context exclusion Must NOT be inside this

metavariable-regex

Regex constraint Constrain $X to match regex

metavariable-comparison

Numeric constraint $X > 100

focus-metavariable

Narrow match location Report only $X location

Related Skills

  • static-analysis

  • CodeQL and Semgrep with SARIF output

  • variant-analysis

  • Pattern-based vulnerability discovery

  • differential-review

  • Security-focused diff analysis

  • insecure-defaults

  • Hardcoded credentials detection

  • security-architect

  • STRIDE threat modeling

Agent Integration

  • security-architect (primary): Custom rule development for security audits

  • code-reviewer (primary): Automated code review rule authoring

  • penetration-tester (secondary): Vulnerability detection rule creation

  • qa (secondary): Quality enforcement rule authoring

Iron Laws

  • NEVER publish a rule without at least 2 true positive and 2 true negative test cases

  • ALWAYS validate rule syntax with semgrep --validate before committing

  • NEVER set confidence to HIGH without testing the rule against a real codebase

  • ALWAYS include WHAT was found, WHY it matters, and HOW to fix it in every rule message

  • NEVER use pattern-regex as the primary matcher — use structural patterns and constrain with metavariable-regex

Anti-Patterns

Anti-Pattern Why It Fails Correct Approach

Publishing untested rules False positives erode developer trust and rules get ignored Write test cases with // ruleid: and // ok: annotations and run semgrep --test

Setting HIGH confidence without validation Overconfident rules mislead reviewers into trusting bad signal Calibrate confidence based on measured false positive rate on real codebases

Vague rule messages Developers cannot remediate without specific guidance Include WHAT was found, WHY it matters, and HOW to fix it in every message

Overly broad patterns with no exclusions High false positive rate causes rule fatigue Add pattern-not clauses for all known-safe alternatives

Using pattern-regex as primary matcher Regex is slower and less precise than structural pattern matching Use structural patterns as primary; constrain with metavariable-regex only

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

  • New pattern -> .claude/context/memory/learnings.md

  • Issue found -> .claude/context/memory/issues.md

  • Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

Cross-Reference: Creator Ecosystem

This skill is part of the Creator Ecosystem. When research uncovers gaps, trigger the appropriate companion creator:

Gap Discovered Required Artifact Creator to Invoke When

Domain knowledge needs a reusable skill skill Skill({ skill: 'skill-creator' })

Gap is a full skill domain

Existing skill has incomplete coverage skill update Skill({ skill: 'skill-updater' })

Close skill exists but incomplete

Capability needs a dedicated agent agent Skill({ skill: 'agent-creator' })

Agent to own the capability

Existing agent needs capability update agent update Skill({ skill: 'agent-updater' })

Close agent exists but incomplete

Domain needs code/project scaffolding template Skill({ skill: 'template-creator' })

Reusable code patterns needed

Behavior needs pre/post execution guards hook Skill({ skill: 'hook-creator' })

Enforcement behavior required

Process needs multi-phase orchestration workflow Skill({ skill: 'workflow-creator' })

Multi-step coordination needed

Artifact needs structured I/O validation schema Skill({ skill: 'schema-creator' })

JSON schema for artifact I/O

User interaction needs a slash command command Skill({ skill: 'command-creator' })

User-facing shortcut needed

Repeated logic needs a reusable CLI tool tool Skill({ skill: 'tool-creator' })

CLI utility needed

Narrow/single-artifact capability only inline Document within this artifact only Too specific to generalize

Ecosystem Alignment Contract (MANDATORY)

This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators:

  • agent-creator for ownership and execution paths

  • skill-creator for capability packaging and assignment

  • tool-creator for executable automation surfaces

  • hook-creator for enforcement and guardrails

  • rule-creator and semgrep-rule-creator for policy and static checks

  • template-creator for standardized scaffolds

  • workflow-creator for orchestration and phase gating

  • command-creator for user/operator command UX

Cross-Creator Handshake (Required)

Before completion, verify all relevant handshakes:

  • Artifact route exists in .claude/CLAUDE.md and related routing docs.

  • Discovery/registry entries are updated (catalog/index/registry as applicable).

  • Companion artifacts are created or explicitly waived with reason.

  • validate-integration.cjs passes for the created artifact.

  • Skill index is regenerated when skill metadata changes.

Research Gate (Exa + arXiv — BOTH MANDATORY)

For new patterns, templates, or workflows, research is mandatory:

  • Use Exa for implementation and ecosystem patterns:

  • mcp__Exa__web_search_exa({ query: '<topic> 2025 best practices' })

  • mcp__Exa__get_code_context_exa({ query: '<topic> implementation examples' })

  • Search arXiv for academic research (mandatory for AI/ML, agents, evaluation, orchestration, memory/RAG, security):

  • Via Exa: mcp__Exa__web_search_exa({ query: 'site:arxiv.org <topic> 2024 2025' })

  • Direct API: WebFetch({ url: 'https://arxiv.org/search/?query=&#x3C;topic>&#x26;searchtype=all&#x26;start=0' })

  • Record decisions, constraints, and non-goals in artifact references/docs.

  • Keep updates minimal and avoid overengineering.

arXiv is mandatory (not fallback) when topic involves: AI agents, LLM evaluation, orchestration, memory/RAG, security, static analysis, or any emerging methodology.

Regression-Safe Delivery

  • Follow strict RED -> GREEN -> REFACTOR for behavior changes.

  • Run targeted tests for changed modules.

  • Run lint/format on changed files.

  • Keep commits scoped by concern (logic/docs/generated artifacts).

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

auth-security-expert

No summary provided by upstream source.

Repository SourceNeeds Review
Security

tauri-security-rules

No summary provided by upstream source.

Repository SourceNeeds Review
Security

security-architect

No summary provided by upstream source.

Repository SourceNeeds Review
Security

k8s-security-policies

No summary provided by upstream source.

Repository SourceNeeds Review