YARA Authoring Skill

Overview

This skill implements Trail of Bits' YARA authoring methodology for the agent-studio framework. YARA-X is the Rust-based successor to legacy YARA, offering improved performance, safety, and new features. This skill teaches you to think and act like an expert YARA author, producing detection rules that are precise, efficient, and maintainable.

Source repository: https://github.com/trailofbits/skills

License: CC-BY-SA-4.0 Target: YARA-X (with legacy YARA compatibility guidance)

When to Use

When creating detection rules for malware samples
When building threat hunting rules for IOC identification
When converting legacy YARA rules to YARA-X format
When optimizing existing rules for performance and accuracy
When reviewing YARA rules for quality and false positive rates
When building rule sets for automated scanning pipelines

Iron Laws

EVERY RULE MUST HAVE EFFICIENT ATOMS AND PASS LINTING — a rule without efficient atoms degrades scanner performance across the entire rule set; always run yr check and yr debug atoms before deployment.
NEVER write rules without testing against both positive and negative samples — false positives on clean files are as harmful as missed detections; validate FP rate before deploying.
ALWAYS include complete metadata (author, date, description, reference, hash) — rules without metadata are unauditable and unmaintainable in enterprise rule sets.
NEVER use single-byte atoms or patterns starting with common bytes (0x00, 0xFF, 0x90) — these generate massive false positive rates and degrade the entire YARA scanning pipeline.
ALWAYS use YARA-X toolchain (yr ) by default — legacy yara /yarac tooling lacks memory safety, performance optimizations, and modern module support; use YARA-X unless backward compatibility is explicitly required.

YARA-X vs Legacy YARA

Key Differences

Feature Legacy YARA YARA-X

Language C Rust

Safety Manual memory management Memory-safe

Performance Good Better (parallelism)

Modules PE, ELF, math, etc. Same + new modules

Syntax YARA syntax Compatible + extensions

Toolchain yara , yarac

yr CLI

YARA-X CLI Commands

Scan a file

yr scan rule.yar target_file

Check rule syntax

yr check rule.yar

View rule atoms (for efficiency analysis)

yr debug atoms rule.yar

Format a rule

yr fmt rule.yar

Rule Structure

Standard Template

import "pe" import "math"

rule MalwareFamily_Variant : tag1 tag2 { meta: author = "analyst-name" date = "2026-02-21" description = "Detects MalwareFamily variant based on [specific indicators]" reference = "https://example.com/analysis-report" hash = "sha256-of-sample" tlp = "WHITE" score = 75

strings:
    // Unique byte sequences from the malware
    $hex_pattern1 = { 48 8B 05 ?? ?? ?? ?? 48 89 45 F0 }
    $hex_pattern2 = { E8 ?? ?? ?? ?? 85 C0 74 ?? }

    // String indicators
    $str_mutex   = "Global\\MalwareMutex_v2" ascii wide
    $str_c2      = "https://evil.example.com/gate.php" ascii
    $str_useragent = "Mozilla/5.0 (compatible; MalBot/1.0)" ascii

    // Encoded/obfuscated patterns
    $b64_config  = "aHR0cHM6Ly9ldmlsLmV4YW1wbGUuY29t" ascii  // base64

condition:
    uint16(0) == 0x5A4D and  // MZ header (PE file)
    filesize &#x3C; 5MB and
    (
        2 of ($hex_*) or
        ($str_mutex and 1 of ($str_c2, $str_useragent)) or
        $b64_config
    )

}

Metadata Fields (Required)

Field Purpose Example

author

Who wrote the rule "Trail of Bits"

date

When rule was created "2026-02-21"

description

What the rule detects "Detects XYZ malware loader"

reference

Source analysis/report "https://..."

hash

Sample hash for validation "sha256:abc123..."

tlp

Traffic Light Protocol "WHITE" , "GREEN" , "AMBER" , "RED"

score

Confidence (0-100) 75

String Pattern Best Practices

Hex Patterns

// GOOD: Specific bytes with targeted wildcards $good = { 48 8B 05 ?? ?? ?? ?? 48 89 45 F0 }

// BAD: Too many wildcards (poor atoms) $bad = { ?? ?? ?? ?? 48 ?? ?? ?? ?? ?? }

// GOOD: Use jumps for variable-length gaps $jump = { 48 8B 05 [4-8] 48 89 45 }

// GOOD: Use alternations for variant bytes $alt = { 48 (8B | 89) 05 ?? ?? ?? ?? }

Text Strings

// Case-insensitive matching $str1 = "CreateRemoteThread" ascii nocase

// Wide strings (UTF-16) $str2 = "cmd.exe" ascii wide

// Full-word matching (avoid substring false positives) $str3 = "evil" ascii fullword

Regular Expressions

// Use sparingly - regex is slower than literal strings $re1 = /https?://[a-z0-9-.]+.(xyz|top|club)//

// Prefer hex patterns over regex for binary content // WRONG: $re2 = /\x48\x8B\x05/ // RIGHT: $hex2 = { 48 8B 05 }

Atom Analysis

Atoms are the fixed byte sequences YARA uses to pre-filter which rules to evaluate. Efficient atoms = fast scanning.

How to Check Atoms

View atoms for a rule

yr debug atoms rule.yar

Good output: unique 4+ byte atoms

Atom: 48 8B 05 (from $hex_pattern1)

Atom: CreateRemoteThread (from $str1)

Bad output: short or common atoms

Atom: 00 00 (too common, will match everything)

Atom Quality Guidelines

Atom Length Quality Action

1-2 bytes Poor Rewrite pattern with more specific bytes

3 bytes Acceptable Consider extending if possible

4+ bytes Good Ideal for efficient scanning

Common bytes (00, FF, 90) Poor Avoid patterns starting with common bytes

Condition Logic

Performance-Ordered Conditions

Place cheap checks first to enable short-circuit evaluation:

condition: // 1. File type check (instant) uint16(0) == 0x5A4D and

// 2. File size check (instant)
filesize &#x3C; 10MB and

// 3. Simple string matches (fast)
$str_mutex and

// 4. Complex conditions (slower)
2 of ($hex_*) and

// 5. Module calls (slowest)
pe.imports("kernel32.dll", "VirtualAllocEx")

Common Condition Patterns

// At least N of a set 2 of ($indicator_*)

// All of a set all of ($required_*)

// Any of a set any of ($optional_*)

// String at specific offset $mz at 0

// String in specific range $header in (0..1024)

// Count-based #suspicious_call > 5

Rule Categories

Category 1: Malware Family Detection

Targets specific malware families with high-confidence indicators.

rule APT_Backdoor_SilentMoon { meta: description = "Detects SilentMoon backdoor used by APT group" score = 90 strings: $config_marker = { 53 4D 43 46 47 } // "SMCFG" $decrypt_routine = { 31 C0 8A 04 08 34 ?? 88 04 08 41 } condition: uint16(0) == 0x5A4D and $config_marker and $decrypt_routine }

Category 2: Technique Detection

Targets specific attack techniques regardless of malware family.

rule TECHNIQUE_ProcessHollowing { meta: description = "Detects process hollowing technique indicators" score = 60 strings: $api1 = "NtUnmapViewOfSection" ascii $api2 = "WriteProcessMemory" ascii $api3 = "SetThreadContext" ascii $api4 = "ResumeThread" ascii condition: uint16(0) == 0x5A4D and 3 of ($api*) }

Category 3: Packer/Obfuscator Detection

Identifies packed or obfuscated executables.

rule PACKER_UPX { meta: description = "Detects UPX packed executables" score = 30 strings: $upx0 = "UPX0" ascii $upx1 = "UPX1" ascii $upx2 = "UPX!" ascii condition: uint16(0) == 0x5A4D and 2 of ($upx*) }

Common Pitfalls

Over-broad rules: Too many wildcards = too many false positives. Be specific.
Under-tested rules: Always test against known-clean files to measure FP rate.
Missing metadata: Rules without metadata are unmaintainable. Always include all required fields.
Ignoring atoms: A rule with poor atoms slows down the entire scanning pipeline.
Hardcoded offsets: Use in (range) instead of exact offsets when possible -- variants shift bytes.
Legacy syntax: Use YARA-X features and yr toolchain, not legacy yara /yarac .

Linting Checklist

Before deploying any rule:

Rule compiles without errors: yr check rule.yar
Rule has efficient atoms: yr debug atoms rule.yar
All required metadata fields present
Tested against target sample (true positive confirmed)
Tested against clean file corpus (false positive rate acceptable)
Condition logic is performance-ordered (cheap checks first)
No overly broad wildcard patterns
Rule follows naming convention: CATEGORY_FamilyName_Variant

Integration with Agent-Studio

Recommended Workflow

Analyze malware sample with binary-analysis-patterns or memory-forensics
Extract indicators and patterns
Use yara-authoring to create detection rules
Lint and atom-analyze rules
Test rules against known samples and clean corpus
Use variant-analysis to find similar samples for rule tuning

Complementary Skills

Skill Relationship

binary-analysis-patterns

Extract indicators from malware for rule authoring

memory-forensics

Extract memory artifacts for memory-scanning rules

variant-analysis

Find malware variants to tune rule coverage

static-analysis

Automated analysis to complement YARA detection

protocol-reverse-engineering

Extract network signatures for YARA rules

Anti-Patterns

Anti-Pattern Why It Fails Correct Approach

Over-broad wildcards (?? ?? ?? ?? ) Poor atoms cause rule to run against every file byte; massive performance degradation Use at least 4 consecutive fixed bytes; scope wildcards to specific positions

Skipping atom analysis Invisible performance sink; rule may have 1-byte atoms causing false positives Always run yr debug atoms rule.yar before deployment

Missing metadata fields Rules become unauditable; cannot trace origin, sample, or analyst Always include: author, date, description, reference, hash, tlp, score

Conditions before file type checks Expensive string matching runs on non-matching file types Place uint16(0) == 0x5A4D (or equivalent) first in every condition

Using nocase on short strings Short case-insensitive patterns match everywhere in arbitrary data Reserve nocase for strings >= 8 bytes; use exact case for shorter patterns

Memory Protocol

Before starting: Check for existing YARA rules in the project for naming conventions and pattern reuse.

During authoring: Write rules incrementally, testing each against the target sample. Document atom analysis results.

After completion: Record effective patterns, atom quality metrics, and false positive rates to .claude/context/memory/learnings.md for improving future rule authoring.

yara-authoring

Safety Notice

Copy this and send it to your AI assistant to learn

Scan a file

Check rule syntax

View rule atoms (for efficiency analysis)

Format a rule

View atoms for a rule

Good output: unique 4+ byte atoms

Atom: 48 8B 05 (from $hex_pattern1)

Atom: CreateRemoteThread (from $str1)

Bad output: short or common atoms

Atom: 00 00 (too common, will match everything)

Source Transparency

Related Skills

filesystem

slack-notifications

chrome-browser

text-to-sql