Binary Analysis Patterns
Security Notice
AUTHORIZED USE ONLY: These skills are for DEFENSIVE security analysis and authorized research:
-
Authorized pentesting engagements with written authorization
-
CTF competitions and security research
-
Defensive security and malware analysis
-
Security research with proper disclosure
-
Educational purposes in controlled environments
NEVER use for:
-
Creating or enhancing malicious code
-
Unauthorized access to systems
-
Bypassing software licensing illegitimately
-
Intellectual property theft
-
Any illegal activities
Comprehensive patterns and techniques for analyzing compiled binaries, understanding assembly code, and reconstructing program logic.
Disassembly Fundamentals
x86-64 Instruction Patterns
Function Prologue/Epilogue
; Standard prologue push rbp ; Save base pointer mov rbp, rsp ; Set up stack frame sub rsp, 0x20 ; Allocate local variables
; Leaf function (no calls) ; May skip frame pointer setup sub rsp, 0x18 ; Just allocate locals
; Standard epilogue mov rsp, rbp ; Restore stack pointer pop rbp ; Restore base pointer ret
; Leave instruction (equivalent) leave ; mov rsp, rbp; pop rbp ret
Calling Conventions
System V AMD64 (Linux, macOS)
; Arguments: RDI, RSI, RDX, RCX, R8, R9, then stack ; Return: RAX (and RDX for 128-bit) ; Caller-saved: RAX, RCX, RDX, RSI, RDI, R8-R11 ; Callee-saved: RBX, RBP, R12-R15
; Example: func(a, b, c, d, e, f, g) mov rdi, [a] ; 1st arg mov rsi, [b] ; 2nd arg mov rdx, [c] ; 3rd arg mov rcx, [d] ; 4th arg mov r8, [e] ; 5th arg mov r9, [f] ; 6th arg push [g] ; 7th arg on stack call func
Microsoft x64 (Windows)
; Arguments: RCX, RDX, R8, R9, then stack ; Shadow space: 32 bytes reserved on stack ; Return: RAX
; Example: func(a, b, c, d, e) sub rsp, 0x28 ; Shadow space + alignment mov rcx, [a] ; 1st arg mov rdx, [b] ; 2nd arg mov r8, [c] ; 3rd arg mov r9, [d] ; 4th arg mov [rsp+0x20], [e] ; 5th arg on stack call func add rsp, 0x28
ARM Assembly Patterns
ARM64 (AArch64) Calling Convention
; Arguments: X0-X7 ; Return: X0 (and X1 for 128-bit) ; Frame pointer: X29 ; Link register: X30
; Function prologue stp x29, x30, [sp, #-16]! ; Save FP and LR mov x29, sp ; Set frame pointer
; Function epilogue ldp x29, x30, [sp], #16 ; Restore FP and LR ret
ARM32 Calling Convention
; Arguments: R0-R3, then stack ; Return: R0 (and R1 for 64-bit) ; Link register: LR (R14)
; Function prologue push {fp, lr} add fp, sp, #4
; Function epilogue pop {fp, pc} ; Return by popping PC
Control Flow Patterns
Conditional Branches
; if (a == b) cmp eax, ebx jne skip_block ; ... if body ... skip_block:
; if (a < b) - signed cmp eax, ebx jge skip_block ; Jump if greater or equal ; ... if body ... skip_block:
; if (a < b) - unsigned cmp eax, ebx jae skip_block ; Jump if above or equal ; ... if body ... skip_block:
Loop Patterns
; for (int i = 0; i < n; i++) xor ecx, ecx ; i = 0 loop_start: cmp ecx, [n] ; i < n jge loop_end ; ... loop body ... inc ecx ; i++ jmp loop_start loop_end:
; while (condition) jmp loop_check loop_body: ; ... body ... loop_check: cmp eax, ebx jl loop_body
; do-while loop_body: ; ... body ... cmp eax, ebx jl loop_body
Switch Statement Patterns
; Jump table pattern mov eax, [switch_var] cmp eax, max_case ja default_case jmp [jump_table + eax*8]
; Sequential comparison (small switch) cmp eax, 1 je case_1 cmp eax, 2 je case_2 cmp eax, 3 je case_3 jmp default_case
Data Structure Patterns
Array Access
; array[i] - 4-byte elements mov eax, [rbx + rcx*4] ; rbx=base, rcx=index
; array[i] - 8-byte elements mov rax, [rbx + rcx*8]
; Multi-dimensional array[i][j] ; arr[i][j] = base + (i * cols + j) * element_size imul eax, [cols] add eax, [j] mov edx, [rbx + rax*4]
Structure Access
struct Example { int a; // offset 0 char b; // offset 4 // padding // offset 5-7 long c; // offset 8 short d; // offset 16 };
; Accessing struct fields mov rdi, [struct_ptr] mov eax, [rdi] ; s->a (offset 0) movzx eax, byte [rdi+4] ; s->b (offset 4) mov rax, [rdi+8] ; s->c (offset 8) movzx eax, word [rdi+16] ; s->d (offset 16)
Linked List Traversal
; while (node != NULL) list_loop: test rdi, rdi ; node == NULL? jz list_done ; ... process node ... mov rdi, [rdi+8] ; node = node->next (assuming next at offset 8) jmp list_loop list_done:
Common Code Patterns
String Operations
; strlen pattern xor ecx, ecx strlen_loop: cmp byte [rdi + rcx], 0 je strlen_done inc ecx jmp strlen_loop strlen_done: ; ecx contains length
; strcpy pattern strcpy_loop: mov al, [rsi] mov [rdi], al test al, al jz strcpy_done inc rsi inc rdi jmp strcpy_loop strcpy_done:
; memcpy using rep movsb mov rdi, dest mov rsi, src mov rcx, count rep movsb
Arithmetic Patterns
; Multiplication by constant ; x * 3 lea eax, [rax + rax*2]
; x * 5 lea eax, [rax + rax*4]
; x * 10 lea eax, [rax + rax*4] ; x * 5 add eax, eax ; * 2
; Division by power of 2 (signed) mov eax, [x] cdq ; Sign extend to EDX:EAX and edx, 7 ; For divide by 8 add eax, edx ; Adjust for negative sar eax, 3 ; Arithmetic shift right
; Modulo power of 2 and eax, 7 ; x % 8
Bit Manipulation
; Test specific bit test eax, 0x80 ; Test bit 7 jnz bit_set
; Set bit or eax, 0x10 ; Set bit 4
; Clear bit and eax, ~0x10 ; Clear bit 4
; Toggle bit xor eax, 0x10 ; Toggle bit 4
; Count leading zeros bsr eax, ecx ; Bit scan reverse xor eax, 31 ; Convert to leading zeros
; Population count (popcnt) popcnt eax, ecx ; Count set bits
Decompilation Patterns
Variable Recovery
; Local variable at rbp-8 mov qword [rbp-8], rax ; Store to local mov rax, [rbp-8] ; Load from local
; Stack-allocated array lea rax, [rbp-0x40] ; Array starts at rbp-0x40 mov [rax], edx ; array[0] = value mov [rax+4], ecx ; array[1] = value
Function Signature Recovery
; Identify parameters by register usage func: ; rdi used as first param (System V) mov [rbp-8], rdi ; Save param to local ; rsi used as second param mov [rbp-16], rsi ; Identify return by RAX at end mov rax, [result] ret
Type Recovery
; 1-byte operations suggest char/bool movzx eax, byte [rdi] ; Zero-extend byte movsx eax, byte [rdi] ; Sign-extend byte
; 2-byte operations suggest short movzx eax, word [rdi] movsx eax, word [rdi]
; 4-byte operations suggest int/float mov eax, [rdi] movss xmm0, [rdi] ; Float
; 8-byte operations suggest long/double/pointer mov rax, [rdi] movsd xmm0, [rdi] ; Double
Ghidra Analysis Tips
Improving Decompilation
// In Ghidra scripting // Fix function signature Function func = getFunctionAt(toAddr(0x401000)); func.setReturnType(IntegerDataType.dataType, SourceType.USER_DEFINED);
// Create structure type StructureDataType struct = new StructureDataType("MyStruct", 0); struct.add(IntegerDataType.dataType, "field_a", null); struct.add(PointerDataType.dataType, "next", null);
// Apply to memory createData(toAddr(0x601000), struct);
Pattern Matching Scripts
Find all calls to dangerous functions
for func in currentProgram.getFunctionManager().getFunctions(True): for ref in getReferencesTo(func.getEntryPoint()): if func.getName() in ["strcpy", "sprintf", "gets"]: print(f"Dangerous call at {ref.getFromAddress()}")
IDA Pro Patterns
IDAPython Analysis
import idaapi import idautils import idc
Find all function calls
def find_calls(func_name): for func_ea in idautils.Functions(): for head in idautils.Heads(func_ea, idc.find_func_end(func_ea)): if idc.print_insn_mnem(head) == "call": target = idc.get_operand_value(head, 0) if idc.get_func_name(target) == func_name: print(f"Call to {func_name} at {hex(head)}")
Rename functions based on strings
def auto_rename(): for s in idautils.Strings(): for xref in idautils.XrefsTo(s.ea): func = idaapi.get_func(xref.frm) if func and "sub_" in idc.get_func_name(func.start_ea): # Use string as hint for naming pass
Best Practices
Analysis Workflow
-
Initial triage: File type, architecture, imports/exports
-
String analysis: Identify interesting strings, error messages
-
Function identification: Entry points, exports, cross-references
-
Control flow mapping: Understand program structure
-
Data structure recovery: Identify structs, arrays, globals
-
Algorithm identification: Crypto, hashing, compression
-
Documentation: Comments, renamed symbols, type definitions
Common Pitfalls
-
Optimizer artifacts: Code may not match source structure
-
Inline functions: Functions may be expanded inline
-
Tail call optimization: jmp instead of call
- ret
-
Dead code: Unreachable code from optimization
-
Position-independent code: RIP-relative addressing
Iron Laws
-
ALWAYS perform static analysis before any dynamic execution — disassemble and map control flow first; executing untrusted binaries without prior static analysis is a security risk and destroys reproducible evidence.
-
NEVER trust decompiler output as ground truth — decompilers produce approximations; always cross-reference decompiled output with the raw disassembly for security-critical paths.
-
NEVER assume sequential execution — indirect jumps, virtual dispatch tables, and JIT-compiled code all break linear flow; always check cross-references and jump tables before tracing a code path.
-
ALWAYS document architecture and calling convention at the start — x86, x64, ARM, and MIPS have different calling conventions; misidentifying them causes every parameter and return-value analysis to be wrong.
-
ALWAYS mark assumptions as unverified until confirmed by dynamic analysis — static analysis is incomplete; flag every inferred behavior with [STATIC-ONLY] and validate with dynamic trace evidence when possible.
Anti-Patterns
Anti-Pattern Why It Fails Correct Approach
Executing binary before static analysis Destroys forensic state; safety risk Map entry points and control flow statically first
Trusting decompiler output verbatim Decompilers introduce artifacts and errors Cross-reference with raw disassembly for critical paths
Assuming linear code flow Misses indirect jumps, vtables, JIT paths Check all xrefs and jump tables before tracing
Skipping architecture documentation Wrong calling convention invalidates all analysis Document arch + ABI before starting any analysis
Treating static inferences as confirmed Inferences can be wrong without dynamic validation Mark as [STATIC-ONLY] until runtime trace confirms
Memory Protocol (MANDATORY)
Before starting: Read C:\dev\projects\agent-studio.claude\context\memory\learnings.md
After completing:
-
New pattern -> C:\dev\projects\agent-studio.claude\context\memory\learnings.md
-
Issue found -> C:\dev\projects\agent-studio.claude\context\memory\issues.md
-
Decision made -> C:\dev\projects\agent-studio.claude\context\memory\decisions.md
ASSUME INTERRUPTION: If it's not in memory, it didn't happen.