Assembly Guide
Applies to: x86-64 (System V ABI), ARM64 (AAPCS), NASM, GAS syntax
Core Principles
-
Clarity Over Cleverness: Comment every instruction's purpose; assembly lacks self-documentation
-
ABI Compliance: Follow calling conventions precisely for interoperability with C/system code
-
Minimal Register Pressure: Preserve callee-saved registers, minimize spills to stack
-
Correctness First: Get it working correctly, then profile, then optimize with SIMD
-
Structured Layout: Use consistent label naming, section organization, and macro definitions
Guardrails
Architecture Selection
-
Declare target architecture at the top of every file
-
x86-64: default for Linux/macOS server and desktop workloads
-
ARM64: default for Apple Silicon, mobile, and embedded Linux
-
Never mix architecture-specific code without %ifdef / .ifdef guards
Calling Conventions
-
x86-64 System V ABI (Linux, macOS, BSD):
-
Arguments: rdi , rsi , rdx , rcx , r8 , r9 (integer/pointer, in order)
-
Floating-point arguments: xmm0 -xmm7
-
Return value: rax (integer), xmm0 (float)
-
Caller-saved (volatile): rax , rcx , rdx , rsi , rdi , r8 -r11
-
Callee-saved (non-volatile): rbx , rbp , r12 -r15
-
Stack must be 16-byte aligned before call instruction
-
ARM64 AAPCS (Linux, macOS):
-
Arguments: x0 -x7 (integer/pointer), d0 -d7 (float)
-
Return value: x0 (integer), d0 (float)
-
Callee-saved: x19 -x28 , x29 (frame pointer), x30 (link register)
-
Stack must be 16-byte aligned at all times
Register Usage
-
Document which registers hold which logical values at function entry
-
Never clobber callee-saved registers without saving and restoring them
-
Use rbp / x29 as frame pointer for debuggability (omit only in leaf functions)
-
Reserve scratch registers for temporaries; name them in comments
-
Zero-extend results when returning values smaller than 64 bits
Stack Management
-
Always maintain 16-byte stack alignment on x86-64 and ARM64
-
Allocate local variables by subtracting from rsp / sp in the prologue
-
Deallocate in the epilogue before ret (never leave the stack dirty)
-
Use red zone (128 bytes below rsp ) only in leaf functions on System V ABI
-
Never write below the stack pointer outside the red zone
Documentation
-
File header: purpose, target architecture, assembler syntax, author
-
Function header: C-style prototype comment, argument register mapping, return value
-
Inline comments: explain the why, not the what (avoid ; increment counter )
-
Label naming: module_function_sublabel (e.g., crypto_sha256_loop )
-
Constants: use equ / .equ directives with descriptive names
Key Patterns
x86-64 Function with Frame Pointer
; long compute(long x, long y, long z) ; Args: rdi = x, rsi = y, rdx = z ; Returns: rax = x * y + z global compute compute: push rbp ; save frame pointer mov rbp, rsp ; establish stack frame mov rax, rdi ; rax = x imul rax, rsi ; rax = x * y add rax, rdx ; rax = x * y + z pop rbp ; restore frame pointer ret
ARM64 AAPCS Function
// int64_t multiply_add(int64_t a, int64_t b, int64_t c) // Args: x0 = a, x1 = b, x2 = c | Returns: x0 = a * b + c .global multiply_add multiply_add: stp x29, x30, [sp, #-16]! // save fp and lr mov x29, sp // establish stack frame mul x0, x0, x1 // x0 = a * b add x0, x0, x2 // x0 = a * b + c ldp x29, x30, [sp], #16 // restore fp and lr ret
SIMD / SSE2 (4 floats per iteration)
; void add_f32(float *dst, const float *a, const float *b, size_t n) ; Args: rdi = dst, rsi = a, rdx = b, rcx = n global add_f32 add_f32: shr rcx, 2 ; n /= 4 .loop: test rcx, rcx jz .done movups xmm0, [rsi] ; load 4 floats from a addps xmm0, [rdx] ; add 4 floats from b movups [rdi], xmm0 ; store result add rsi, 16 add rdx, 16 add rdi, 16 dec rcx jnz .loop .done: ret
Linux x86-64 Syscall Interface
; Syscall: rax = number, args in rdi/rsi/rdx/r10/r8/r9, return in rax ; Note: r10 replaces rcx (clobbered by syscall instruction) SYS_WRITE equ 1 SYS_EXIT equ 60
section .data msg db "Hello, world!", 10 msg_len equ $ - msg
section .text global _start _start: mov rax, SYS_WRITE ; write(stdout, msg, msg_len) mov rdi, 1 ; fd = STDOUT lea rsi, [rel msg] ; RIP-relative for PIC mov rdx, msg_len syscall mov rax, SYS_EXIT ; exit(0) xor edi, edi syscall
Position-Independent Code (PIC)
default rel ; all memory refs become RIP-relative
section .data counter dq 0
section .text global get_counter get_counter: mov rax, [counter] ; RIP-relative with default rel ret
global increment_counter increment_counter: lock inc qword [counter] ; atomic increment (thread-safe) mov rax, [counter] ret
Debugging
GDB Commands
gdb ./program (gdb) layout asm # show disassembly window (gdb) layout regs # show registers window (gdb) stepi # step one instruction (gdb) nexti # step over call (gdb) info registers # print all register values (gdb) p/x $rax # print rax in hex (gdb) x/4gx $rsp # examine 4 quad-words at stack pointer (gdb) break *0x401000 # break at address (gdb) display/i $pc # show current instruction after each step (gdb) set disassembly-flavor intel
objdump & strace
objdump -d -M intel program # disassemble with Intel syntax objdump -h program # show section headers objdump -t program # show symbol table objdump -r program.o # show relocations (PIC debugging)
strace ./program # trace all syscalls strace -e trace=write,read ./program # filter specific syscalls
Tooling
Assemblers & Linkers
NASM (Intel syntax)
nasm -f elf64 -g -F dwarf program.asm -o program.o # Linux nasm -f macho64 program.asm -o program.o # macOS
GAS (AT&T syntax, supports .intel_syntax)
as --64 -g program.s -o program.o
LLVM
clang -c program.s -o program.o
Linking
ld -o program program.o # bare metal (no libc) gcc -o program program.o # with libc (C interop) gcc -shared -o libfoo.so foo.o # shared library (requires PIC)
Verification
nm program.o # verify symbol visibility nm -u program.o # check undefined references readelf -S program.o # verify section layout
In GDB: p/x $rsp & 0xf # should be 0x0 at call boundaries
References
For detailed patterns and code examples, see:
- references/patterns.md -- Prologue/epilogue, syscall examples, SIMD patterns
External References
-
x86-64 System V ABI Specification
-
ARM Architecture Reference Manual
-
NASM Documentation
-
GAS Manual (GNU Assembler)
-
Intel Intrinsics Guide (SSE/AVX)
-
Linux Syscall Table (x86-64)
-
Agner Fog's Optimization Manuals
-
Felix Cloutier x86 Instruction Reference