Semgrep Rule Creator
Security Notice
AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:
-
Custom security rule development for owned codebases
-
Coding standard enforcement via automated checks
-
CI/CD security gate rule authoring
-
Vulnerability pattern codification for prevention
-
Educational purposes in controlled environments
NEVER use for:
-
Creating rules to bypass security controls
-
Scanning systems without authorization
-
Any illegal activities
Step 1: Define the Detection Goal
Before writing a rule, clearly define:
-
What to detect: The vulnerable or undesired code pattern
-
Why it matters: The security impact or quality concern
-
What languages: Which programming languages to target
-
True positive example: Code that SHOULD match
-
True negative example: Code that should NOT match (safe alternative)
-
False positive risks: What similar-looking code is actually safe
Detection Goal Template
Rule: [rule-id]
- Detect: [description of what to find]
- Why: [security impact / quality concern]
- Languages: [javascript, typescript, python, etc.]
- CWE: [CWE-XXX]
- OWASP: [A0X category]
- True Positive: [code example that should match]
- True Negative: [safe code that should NOT match]
Step 2: Write the Semgrep Rule
Basic Rule Structure
rules:
- id: rule-id-here
message: >
Clear description of what was found and why it matters.
Include remediation guidance in the message.
severity: ERROR # ERROR, WARNING, INFO
languages: [javascript, typescript]
metadata:
cwe:
- CWE-089
owasp:
- A03:2021
confidence: HIGH # HIGH, MEDIUM, LOW
impact: HIGH # HIGH, MEDIUM, LOW
category: security
subcategory:
- vuln
technology:
- express
- node.js
references:
- https://owasp.org/Top10/A03_2021-Injection/
source-rule-url: https://semgrep.dev/r/rule-id
Pattern goes here (see below)
Pattern Types
Simple Pattern Match
pattern: | eval($X)
Pattern with Alternatives (OR)
pattern-either:
- pattern: eval($X)
- pattern: new Function($X)
- pattern: setTimeout($X, ...)
- pattern: setInterval($X, ...)
Pattern with Exclusions (AND NOT)
patterns:
- pattern: $DB.query($QUERY)
- pattern-not: $DB.query($QUERY, $PARAMS)
- pattern-not: $DB.query($QUERY, [...])
Pattern Inside Context
patterns:
- pattern: $RES.send($DATA)
- pattern-inside: | app.$METHOD($PATH, function($REQ, $RES) { ... })
- pattern-not-inside: | app.$METHOD($PATH, authenticate, function($REQ, $RES) { ... })
Metavariable Constraints
patterns:
- pattern: crypto.createHash($ALGO)
- metavariable-regex: metavariable: $ALGO regex: (md5|sha1|MD5|SHA1)
- focus-metavariable: $ALGO
patterns:
- pattern: setTimeout($FUNC, $TIME)
- metavariable-comparison: metavariable: $TIME comparison: $TIME > 60000
Taint Mode Rules (Advanced)
For tracking data flow from sources to sinks:
mode: taint pattern-sources:
- patterns:
- pattern: $REQ.query.$PARAM
- patterns:
- pattern: $REQ.body.$PARAM
- patterns:
- pattern: $REQ.params.$PARAM pattern-sinks:
- patterns:
- pattern: $DB.query($SINK, ...)
- focus-metavariable: $SINK pattern-sanitizers:
- patterns:
- pattern: escape($X)
- patterns:
- pattern: sanitize($X)
- patterns:
- pattern: $DB.query($QUERY, [...])
Step 3: Common Rule Templates
SQL Injection Detection
rules:
- id: sql-injection-string-concat
message: >
Possible SQL injection via string concatenation. User input appears
to be concatenated into a SQL query string. Use parameterized
queries instead.
severity: ERROR
languages: [javascript, typescript]
metadata:
cwe: [CWE-089]
owasp: [A03:2021]
confidence: HIGH
impact: HIGH
category: security
patterns:
- pattern-either:
- pattern: $DB.query("..." + $VAR + "...")
- pattern: $DB.query(
...${$VAR}...)
- pattern-not: $DB.query("..." + $VAR + "...", [...]) fix: | $DB.query("... $1 ...", [$VAR])
- pattern-either:
XSS Detection
rules:
- id: xss-innerhtml-assignment
message: >
Direct assignment to innerHTML with potentially untrusted data.
Use textContent for text or a sanitization library for HTML.
severity: ERROR
languages: [javascript, typescript]
metadata:
cwe: [CWE-079]
owasp: [A03:2021]
confidence: MEDIUM
impact: HIGH
category: security
pattern-either:
- pattern: $EL.innerHTML = $DATA
- pattern: document.getElementById($ID).innerHTML = $DATA
Hardcoded Secrets
rules:
- id: hardcoded-api-key
message: >
Hardcoded API key detected. Store secrets in environment
variables or a secrets manager.
severity: ERROR
languages: [javascript, typescript, python]
metadata:
cwe: [CWE-798]
owasp: [A02:2021]
confidence: MEDIUM
impact: HIGH
category: security
pattern-either:
- pattern: | $KEY = "AKIA..."
- pattern: | $KEY = "sk-..."
- pattern: | $KEY = "ghp_..." pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36})
Missing Authentication
rules:
- id: express-route-missing-auth
message: >
Express route handler without authentication middleware.
Add authentication middleware before the handler.
severity: WARNING
languages: [javascript, typescript]
metadata:
cwe: [CWE-306]
owasp: [A07:2021]
confidence: MEDIUM
impact: HIGH
category: security
patterns:
- pattern-either:
- pattern: app.post($PATH, function($REQ, $RES) { ... })
- pattern: app.put($PATH, function($REQ, $RES) { ... })
- pattern: app.delete($PATH, function($REQ, $RES) { ... })
- pattern: router.post($PATH, function($REQ, $RES) { ... })
- pattern: router.put($PATH, function($REQ, $RES) { ... })
- pattern: router.delete($PATH, function($REQ, $RES) { ... })
- pattern-not-inside: | app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
- pattern-not-inside: | router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... })
- pattern-either:
Insecure Randomness
rules:
- id: insecure-random-for-security
message: >
Math.random() is not cryptographically secure. Use
crypto.getRandomValues() or crypto.randomBytes() for
security-sensitive random values.
severity: WARNING
languages: [javascript, typescript]
metadata:
cwe: [CWE-330]
confidence: MEDIUM
impact: MEDIUM
category: security
patterns:
- pattern: Math.random()
- pattern-inside: | function $FUNC(...) { ... }
- metavariable-regex: metavariable: $FUNC regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce)
Step 4: Write Rule Tests
Test File Format
Create a test file alongside the rule:
// ruleid: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ' + userId);
// ruleid: sql-injection-string-concat
db.query(SELECT * FROM users WHERE id = ${userId});
// ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = $1', [userId]);
// ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ?', [userId]);
Running Tests
Test a single rule
semgrep --test --config=rules/sql-injection.yml tests/
Test all rules
semgrep --test --config=rules/ tests/
Validate rule syntax
semgrep --validate --config=rules/
Step 5: Rule Optimization
Performance Best Practices
-
Be specific with patterns: Avoid overly broad matches like $X($Y)
-
Use pattern-inside to scope: Narrow the search context
-
Use language-specific syntax: Leverage language features
-
Avoid deep ellipsis nesting: ... ... ... is slow
-
Use focus-metavariable: Narrow the reported location
-
Test with large codebases: Verify performance at scale
Reducing False Positives
-
Add pattern-not for safe patterns: Exclude known-safe alternatives
-
Use metavariable-regex: Constrain metavariable values
-
Use pattern-not-inside: Exclude safe contexts
-
Set appropriate confidence: Be honest about detection certainty
-
Add technology metadata: Help users filter relevant rules
-
Provide fix suggestions: When possible, include fix: field
Rule Validation Checklist
-
Rule has unique, descriptive ID
-
Message explains the issue AND remediation
-
Severity matches actual risk
-
Metadata includes CWE, OWASP, confidence, impact
-
At least 2 true positive test cases
-
At least 2 true negative test cases
-
Rule validated with semgrep --validate
-
Rule tested with semgrep --test
-
Performance acceptable on large codebase
-
Fix suggestion provided (if applicable)
Semgrep Pattern Syntax Reference
Syntax Meaning Example
$X
Single metavariable eval($X)
$...X
Multiple metavariable args func($...ARGS)
...
Ellipsis (any statements) if (...) { ... }
<... $X ...>
Deep expression match <... eval($X) ...>
pattern-either
OR operator Match any of N patterns
pattern-not
NOT operator Exclude specific patterns
pattern-inside
Context requirement Must be inside this pattern
pattern-not-inside
Context exclusion Must NOT be inside this
metavariable-regex
Regex constraint Constrain $X to match regex
metavariable-comparison
Numeric constraint $X > 100
focus-metavariable
Narrow match location Report only $X location
Related Skills
-
static-analysis
-
CodeQL and Semgrep with SARIF output
-
variant-analysis
-
Pattern-based vulnerability discovery
-
differential-review
-
Security-focused diff analysis
-
insecure-defaults
-
Hardcoded credentials detection
-
security-architect
-
STRIDE threat modeling
Agent Integration
-
security-architect (primary): Custom rule development for security audits
-
code-reviewer (primary): Automated code review rule authoring
-
penetration-tester (secondary): Vulnerability detection rule creation
-
qa (secondary): Quality enforcement rule authoring
Iron Laws
-
NEVER publish a rule without at least 2 true positive and 2 true negative test cases
-
ALWAYS validate rule syntax with semgrep --validate before committing
-
NEVER set confidence to HIGH without testing the rule against a real codebase
-
ALWAYS include WHAT was found, WHY it matters, and HOW to fix it in every rule message
-
NEVER use pattern-regex as the primary matcher — use structural patterns and constrain with metavariable-regex
Anti-Patterns
Anti-Pattern Why It Fails Correct Approach
Publishing untested rules False positives erode developer trust and rules get ignored Write test cases with // ruleid: and // ok: annotations and run semgrep --test
Setting HIGH confidence without validation Overconfident rules mislead reviewers into trusting bad signal Calibrate confidence based on measured false positive rate on real codebases
Vague rule messages Developers cannot remediate without specific guidance Include WHAT was found, WHY it matters, and HOW to fix it in every message
Overly broad patterns with no exclusions High false positive rate causes rule fatigue Add pattern-not clauses for all known-safe alternatives
Using pattern-regex as primary matcher Regex is slower and less precise than structural pattern matching Use structural patterns as primary; constrain with metavariable-regex only
Memory Protocol (MANDATORY)
Before starting: Read .claude/context/memory/learnings.md
After completing:
-
New pattern -> .claude/context/memory/learnings.md
-
Issue found -> .claude/context/memory/issues.md
-
Decision made -> .claude/context/memory/decisions.md
ASSUME INTERRUPTION: If it's not in memory, it didn't happen.
Cross-Reference: Creator Ecosystem
This skill is part of the Creator Ecosystem. When research uncovers gaps, trigger the appropriate companion creator:
Gap Discovered Required Artifact Creator to Invoke When
Domain knowledge needs a reusable skill skill Skill({ skill: 'skill-creator' })
Gap is a full skill domain
Existing skill has incomplete coverage skill update Skill({ skill: 'skill-updater' })
Close skill exists but incomplete
Capability needs a dedicated agent agent Skill({ skill: 'agent-creator' })
Agent to own the capability
Existing agent needs capability update agent update Skill({ skill: 'agent-updater' })
Close agent exists but incomplete
Domain needs code/project scaffolding template Skill({ skill: 'template-creator' })
Reusable code patterns needed
Behavior needs pre/post execution guards hook Skill({ skill: 'hook-creator' })
Enforcement behavior required
Process needs multi-phase orchestration workflow Skill({ skill: 'workflow-creator' })
Multi-step coordination needed
Artifact needs structured I/O validation schema Skill({ skill: 'schema-creator' })
JSON schema for artifact I/O
User interaction needs a slash command command Skill({ skill: 'command-creator' })
User-facing shortcut needed
Repeated logic needs a reusable CLI tool tool Skill({ skill: 'tool-creator' })
CLI utility needed
Narrow/single-artifact capability only inline Document within this artifact only Too specific to generalize
Ecosystem Alignment Contract (MANDATORY)
This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators:
-
agent-creator for ownership and execution paths
-
skill-creator for capability packaging and assignment
-
tool-creator for executable automation surfaces
-
hook-creator for enforcement and guardrails
-
rule-creator and semgrep-rule-creator for policy and static checks
-
template-creator for standardized scaffolds
-
workflow-creator for orchestration and phase gating
-
command-creator for user/operator command UX
Cross-Creator Handshake (Required)
Before completion, verify all relevant handshakes:
-
Artifact route exists in .claude/CLAUDE.md and related routing docs.
-
Discovery/registry entries are updated (catalog/index/registry as applicable).
-
Companion artifacts are created or explicitly waived with reason.
-
validate-integration.cjs passes for the created artifact.
-
Skill index is regenerated when skill metadata changes.
Research Gate (Exa + arXiv — BOTH MANDATORY)
For new patterns, templates, or workflows, research is mandatory:
-
Use Exa for implementation and ecosystem patterns:
-
mcp__Exa__web_search_exa({ query: '<topic> 2025 best practices' })
-
mcp__Exa__get_code_context_exa({ query: '<topic> implementation examples' })
-
Search arXiv for academic research (mandatory for AI/ML, agents, evaluation, orchestration, memory/RAG, security):
-
Via Exa: mcp__Exa__web_search_exa({ query: 'site:arxiv.org <topic> 2024 2025' })
-
Direct API: WebFetch({ url: 'https://arxiv.org/search/?query=<topic>&searchtype=all&start=0' })
-
Record decisions, constraints, and non-goals in artifact references/docs.
-
Keep updates minimal and avoid overengineering.
arXiv is mandatory (not fallback) when topic involves: AI agents, LLM evaluation, orchestration, memory/RAG, security, static analysis, or any emerging methodology.
Regression-Safe Delivery
-
Follow strict RED -> GREEN -> REFACTOR for behavior changes.
-
Run targeted tests for changed modules.
-
Run lint/format on changed files.
-
Keep commits scoped by concern (logic/docs/generated artifacts).