Semgrep Rule Creator

Security Notice

AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:

Custom security rule development for owned codebases
Coding standard enforcement via automated checks
CI/CD security gate rule authoring
Vulnerability pattern codification for prevention
Educational purposes in controlled environments

NEVER use for:

Creating rules to bypass security controls
Scanning systems without authorization
Any illegal activities

Step 1: Define the Detection Goal

Before writing a rule, clearly define:

What to detect: The vulnerable or undesired code pattern
Why it matters: The security impact or quality concern
What languages: Which programming languages to target
True positive example: Code that SHOULD match
True negative example: Code that should NOT match (safe alternative)
False positive risks: What similar-looking code is actually safe

Detection Goal Template

Rule: [rule-id]

Detect: [description of what to find]
Why: [security impact / quality concern]
Languages: [javascript, typescript, python, etc.]
CWE: [CWE-XXX]
OWASP: [A0X category]
True Positive: [code example that should match]
True Negative: [safe code that should NOT match]

Step 2: Write the Semgrep Rule

Basic Rule Structure

rules:

id: rule-id-here message: > Clear description of what was found and why it matters. Include remediation guidance in the message. severity: ERROR # ERROR, WARNING, INFO languages: [javascript, typescript] metadata: cwe: - CWE-089 owasp: - A03:2021 confidence: HIGH # HIGH, MEDIUM, LOW impact: HIGH # HIGH, MEDIUM, LOW category: security subcategory: - vuln technology: - express - node.js references: - https://owasp.org/Top10/A03_2021-Injection/ source-rule-url: https://semgrep.dev/r/rule-id
Pattern goes here (see below)

Pattern Types

Simple Pattern Match

pattern: | eval($X)

Pattern with Alternatives (OR)

pattern-either:

pattern: eval($X)
pattern: new Function($X)
pattern: setTimeout($X, ...)
pattern: setInterval($X, ...)

Pattern with Exclusions (AND NOT)

patterns:

pattern: $DB.query($QUERY)
pattern-not: $DB.query($QUERY, $PARAMS)
pattern-not: $DB.query($QUERY, [...])

Pattern Inside Context

patterns:

pattern: $RES.send($DATA)
pattern-inside: | app.$METHOD($PATH, function($REQ, $RES) { ... })
pattern-not-inside: | app.$METHOD($PATH, authenticate, function($REQ, $RES) { ... })

Metavariable Constraints

patterns:

pattern: crypto.createHash($ALGO)
metavariable-regex: metavariable: $ALGO regex: (md5|sha1|MD5|SHA1)
focus-metavariable: $ALGO

patterns:

pattern: setTimeout($FUNC, $TIME)
metavariable-comparison: metavariable: $TIME comparison: $TIME > 60000

Taint Mode Rules (Advanced)

For tracking data flow from sources to sinks:

mode: taint pattern-sources:

patterns:
- pattern: $REQ.query.$PARAM
patterns:
- pattern: $REQ.body.$PARAM
patterns:
- pattern: $REQ.params.$PARAM pattern-sinks:
patterns:
- pattern: $DB.query($SINK, ...)
- focus-metavariable: $SINK pattern-sanitizers:
patterns:
- pattern: escape($X)
patterns:
- pattern: sanitize($X)
patterns:
- pattern: $DB.query($QUERY, [...])

Step 3: Common Rule Templates

SQL Injection Detection

rules:

id: sql-injection-string-concat message: > Possible SQL injection via string concatenation. User input appears to be concatenated into a SQL query string. Use parameterized queries instead. severity: ERROR languages: [javascript, typescript] metadata: cwe: [CWE-089] owasp: [A03:2021] confidence: HIGH impact: HIGH category: security patterns:
- pattern-either:
  - pattern: $DB.query("..." + $VAR + "...")
  - pattern: $DB.query(...${$VAR}...)
- pattern-not: $DB.query("..." + $VAR + "...", [...]) fix: | $DB.query("... $1 ...", [$VAR])

XSS Detection

rules:

id: xss-innerhtml-assignment message: > Direct assignment to innerHTML with potentially untrusted data. Use textContent for text or a sanitization library for HTML. severity: ERROR languages: [javascript, typescript] metadata: cwe: [CWE-079] owasp: [A03:2021] confidence: MEDIUM impact: HIGH category: security pattern-either:
- pattern: $EL.innerHTML = $DATA
- pattern: document.getElementById($ID).innerHTML = $DATA

Hardcoded Secrets

rules:

id: hardcoded-api-key message: > Hardcoded API key detected. Store secrets in environment variables or a secrets manager. severity: ERROR languages: [javascript, typescript, python] metadata: cwe: [CWE-798] owasp: [A02:2021] confidence: MEDIUM impact: HIGH category: security pattern-either:
- pattern: | $KEY = "AKIA..."
- pattern: | $KEY = "sk-..."
- pattern: | $KEY = "ghp_..." pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36})

Missing Authentication

rules:

id: express-route-missing-auth message: > Express route handler without authentication middleware. Add authentication middleware before the handler. severity: WARNING languages: [javascript, typescript] metadata: cwe: [CWE-306] owasp: [A07:2021] confidence: MEDIUM impact: HIGH category: security patterns:
- pattern-either:
  - pattern: app.post($PATH, function($REQ, $RES) { ... })
  - pattern: app.put($PATH, function($REQ, $RES) { ... })
  - pattern: app.delete($PATH, function($REQ, $RES) { ... })
  - pattern: router.post($PATH, function($REQ, $RES) { ... })
  - pattern: router.put($PATH, function($REQ, $RES) { ... })
  - pattern: router.delete($PATH, function($REQ, $RES) { ... })
- pattern-not-inside: | app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
- pattern-not-inside: | router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })

Insecure Randomness

rules:

id: insecure-random-for-security message: > Math.random() is not cryptographically secure. Use crypto.getRandomValues() or crypto.randomBytes() for security-sensitive random values. severity: WARNING languages: [javascript, typescript] metadata: cwe: [CWE-330] confidence: MEDIUM impact: MEDIUM category: security patterns:
- pattern: Math.random()
- pattern-inside: | function $FUNC(...) { ... }
- metavariable-regex: metavariable: $FUNC regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce)

Step 4: Write Rule Tests

Test File Format

Create a test file alongside the rule:

// ruleid: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ' + userId);

// ruleid: sql-injection-string-concat db.query(SELECT * FROM users WHERE id = ${userId});

// ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = $1', [userId]);

// ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ?', [userId]);

Running Tests

Test a single rule

semgrep --test --config=rules/sql-injection.yml tests/

Test all rules

semgrep --test --config=rules/ tests/

Validate rule syntax

semgrep --validate --config=rules/

Step 5: Rule Optimization

Performance Best Practices

Be specific with patterns: Avoid overly broad matches like $X($Y)
Use pattern-inside to scope: Narrow the search context
Use language-specific syntax: Leverage language features
Avoid deep ellipsis nesting: ... ... ... is slow
Use focus-metavariable: Narrow the reported location
Test with large codebases: Verify performance at scale

Reducing False Positives

Add pattern-not for safe patterns: Exclude known-safe alternatives
Use metavariable-regex: Constrain metavariable values
Use pattern-not-inside: Exclude safe contexts
Set appropriate confidence: Be honest about detection certainty
Add technology metadata: Help users filter relevant rules
Provide fix suggestions: When possible, include fix: field

Rule Validation Checklist

Rule has unique, descriptive ID
Message explains the issue AND remediation
Severity matches actual risk
Metadata includes CWE, OWASP, confidence, impact
At least 2 true positive test cases
At least 2 true negative test cases
Rule validated with semgrep --validate
Rule tested with semgrep --test
Performance acceptable on large codebase
Fix suggestion provided (if applicable)

Semgrep Pattern Syntax Reference

Syntax Meaning Example

Single metavariable eval($X)

$...X

Multiple metavariable args func($...ARGS)

...

Ellipsis (any statements) if (...) { ... }

<... $X ...>

Deep expression match <... eval($X) ...>

pattern-either

OR operator Match any of N patterns

pattern-not

NOT operator Exclude specific patterns

pattern-inside

Context requirement Must be inside this pattern

pattern-not-inside

Context exclusion Must NOT be inside this

metavariable-regex

Regex constraint Constrain $X to match regex

metavariable-comparison

Numeric constraint $X > 100

focus-metavariable

Narrow match location Report only $X location

Related Skills

static-analysis
CodeQL and Semgrep with SARIF output
variant-analysis
Pattern-based vulnerability discovery
differential-review
Security-focused diff analysis
insecure-defaults
Hardcoded credentials detection
security-architect
STRIDE threat modeling

Agent Integration

security-architect (primary): Custom rule development for security audits
code-reviewer (primary): Automated code review rule authoring
penetration-tester (secondary): Vulnerability detection rule creation
qa (secondary): Quality enforcement rule authoring

Iron Laws

NEVER publish a rule without at least 2 true positive and 2 true negative test cases
ALWAYS validate rule syntax with semgrep --validate before committing
NEVER set confidence to HIGH without testing the rule against a real codebase
ALWAYS include WHAT was found, WHY it matters, and HOW to fix it in every rule message
NEVER use pattern-regex as the primary matcher — use structural patterns and constrain with metavariable-regex

Anti-Patterns

Anti-Pattern Why It Fails Correct Approach

Publishing untested rules False positives erode developer trust and rules get ignored Write test cases with // ruleid: and // ok: annotations and run semgrep --test

Setting HIGH confidence without validation Overconfident rules mislead reviewers into trusting bad signal Calibrate confidence based on measured false positive rate on real codebases

Vague rule messages Developers cannot remediate without specific guidance Include WHAT was found, WHY it matters, and HOW to fix it in every message

Overly broad patterns with no exclusions High false positive rate causes rule fatigue Add pattern-not clauses for all known-safe alternatives

Using pattern-regex as primary matcher Regex is slower and less precise than structural pattern matching Use structural patterns as primary; constrain with metavariable-regex only

Memory Protocol (MANDATORY)

Before starting: Read .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.

Cross-Reference: Creator Ecosystem

This skill is part of the Creator Ecosystem. When research uncovers gaps, trigger the appropriate companion creator:

Gap Discovered Required Artifact Creator to Invoke When

Domain knowledge needs a reusable skill skill Skill({ skill: 'skill-creator' })

Gap is a full skill domain

Existing skill has incomplete coverage skill update Skill({ skill: 'skill-updater' })

Close skill exists but incomplete

Capability needs a dedicated agent agent Skill({ skill: 'agent-creator' })

Agent to own the capability

Existing agent needs capability update agent update Skill({ skill: 'agent-updater' })

Close agent exists but incomplete

Domain needs code/project scaffolding template Skill({ skill: 'template-creator' })

Reusable code patterns needed

Behavior needs pre/post execution guards hook Skill({ skill: 'hook-creator' })

Enforcement behavior required

Process needs multi-phase orchestration workflow Skill({ skill: 'workflow-creator' })

Multi-step coordination needed

Artifact needs structured I/O validation schema Skill({ skill: 'schema-creator' })

JSON schema for artifact I/O

User interaction needs a slash command command Skill({ skill: 'command-creator' })

User-facing shortcut needed

Repeated logic needs a reusable CLI tool tool Skill({ skill: 'tool-creator' })

CLI utility needed

Narrow/single-artifact capability only inline Document within this artifact only Too specific to generalize

Ecosystem Alignment Contract (MANDATORY)

This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators:

agent-creator for ownership and execution paths
skill-creator for capability packaging and assignment
tool-creator for executable automation surfaces
hook-creator for enforcement and guardrails
rule-creator and semgrep-rule-creator for policy and static checks
template-creator for standardized scaffolds
workflow-creator for orchestration and phase gating
command-creator for user/operator command UX

Cross-Creator Handshake (Required)

Before completion, verify all relevant handshakes:

Artifact route exists in .claude/CLAUDE.md and related routing docs.
Discovery/registry entries are updated (catalog/index/registry as applicable).
Companion artifacts are created or explicitly waived with reason.
validate-integration.cjs passes for the created artifact.
Skill index is regenerated when skill metadata changes.

Research Gate (Exa + arXiv — BOTH MANDATORY)

For new patterns, templates, or workflows, research is mandatory:

Use Exa for implementation and ecosystem patterns:
mcp__Exa__web_search_exa({ query: '<topic> 2025 best practices' })
mcp__Exa__get_code_context_exa({ query: '<topic> implementation examples' })
Search arXiv for academic research (mandatory for AI/ML, agents, evaluation, orchestration, memory/RAG, security):
Via Exa: mcp__Exa__web_search_exa({ query: 'site:arxiv.org <topic> 2024 2025' })
Direct API: WebFetch({ url: 'https://arxiv.org/search/?query=<topic>&searchtype=all&start=0' })
Record decisions, constraints, and non-goals in artifact references/docs.
Keep updates minimal and avoid overengineering.

arXiv is mandatory (not fallback) when topic involves: AI agents, LLM evaluation, orchestration, memory/RAG, security, static analysis, or any emerging methodology.

Regression-Safe Delivery

Follow strict RED -> GREEN -> REFACTOR for behavior changes.
Run targeted tests for changed modules.
Run lint/format on changed files.
Keep commits scoped by concern (logic/docs/generated artifacts).

semgrep-rule-creator

Safety Notice

Copy this and send it to your AI assistant to learn

Rule: [rule-id]

Pattern goes here (see below)

Test a single rule

Test all rules

Validate rule syntax

Source Transparency

Related Skills

auth-security-expert

tauri-security-rules

security-architect

k8s-security-policies