assembly-arm

ARM / AArch64 Assembly

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "assembly-arm" with this command: npx skills add mohitmishra786/low-level-dev-skills/mohitmishra786-low-level-dev-skills-assembly-arm

ARM / AArch64 Assembly

Purpose

Guide agents through AArch64 (64-bit) and ARM (32-bit Thumb) assembly: registers, calling conventions, inline asm, and NEON/SVE SIMD patterns.

Triggers

  • "How do I read ARM64 assembly output?"

  • "What are the AArch64 registers and calling convention?"

  • "How do I write inline asm for ARM?"

  • "What is the difference between AArch64 and ARM Thumb?"

  • "How do I use NEON intrinsics?"

Workflow

  1. Generate ARM assembly

AArch64 (native or cross-compile)

aarch64-linux-gnu-gcc -S -O2 foo.c -o foo.s

32-bit ARM Thumb

arm-linux-gnueabihf-gcc -S -O2 -mthumb foo.c -o foo.s

From objdump

aarch64-linux-gnu-objdump -d -S prog

From GDB on target

(gdb) disassemble /s main

  1. AArch64 registers (AAPCS64)

Register Alias Role

x0 –x7

— Arguments 1–8 and return values

x8

xr

Indirect result location (struct return)

x9 –x15

— Caller-saved temporaries

x16 –x17

ip0 , ip1

Intra-procedure-call temporaries (used by linker)

x18

pr

Platform register (reserved on some OS)

x19 –x28

— Callee-saved

x29

fp

Frame pointer (callee-saved)

x30

lr

Link register (return address)

sp

— Stack pointer (must be 16-byte aligned at call)

pc

— Program counter (not directly accessible)

xzr

wzr

Zero register (reads as 0, writes discarded)

v0 –v7

q0 –q7

FP/SIMD args and return

v8 –v15

— Callee-saved SIMD (lower 64 bits only)

v16 –v31

— Caller-saved temporaries

Width variants: x0 (64-bit), w0 (32-bit, zero-extends to 64), h0 (16), b0 (8).

  1. AAPCS64 calling convention

Integer/pointer args: x0 –x7

Float/SIMD args: v0 –v7

Return: x0 (int), x0 +x1 (128-bit), v0 (float/SIMD) Callee-saved: x19 –x28 , x29 (fp), x30 (lr), v8 –v15 (lower 64 bits) Caller-saved: everything else

Stack must be 16-byte aligned at any bl or blr instruction.

  1. Common AArch64 instructions

Instruction Effect

mov x0, x1

Copy register

mov x0, #42

Load immediate

movz x0, #0x1234, lsl #16

Move zero-extended with shift

movk x0, #0xabcd

Move with keep (partial update)

ldr x0, [x1]

Load 64-bit from address in x1

ldr x0, [x1, #8]

Load from x1+8

str x0, [x1, #8]

Store x0 to x1+8

ldp x0, x1, [sp, #16]

Load pair (two regs at once)

stp x29, x30, [sp, #-16]!

Store pair, pre-decrement sp

add x0, x1, x2

x0 = x1 + x2

add x0, x1, #8

x0 = x1 + 8

sub x0, x1, x2

x0 = x1 - x2

mul x0, x1, x2

x0 = x1 * x2

sdiv x0, x1, x2

Signed divide

udiv x0, x1, x2

Unsigned divide

cmp x0, x1

Set flags for x0 - x1

cbz x0, label

Branch if x0 == 0

cbnz x0, label

Branch if x0 != 0

bl func

Branch with link (call)

blr x0

Branch with link to address in x0

ret

Return (branch to x30)

ret x0

Return to address in x0

adrp x0, symbol

PC-relative page address

add x0, x0, :lo12:symbol

Low 12 bits of symbol offset

  1. Typical function prologue/epilogue

// Non-leaf function stp x29, x30, [sp, #-32]! // save fp, lr; allocate 32 bytes mov x29, sp // set frame pointer stp x19, x20, [sp, #16] // save callee-saved registers // ... body ... ldp x19, x20, [sp, #16] // restore ldp x29, x30, [sp], #32 // restore fp, lr; deallocate ret

// Leaf function (no calls, no callee-saved regs needed) // Can use red zone (no rsp adjustment) — but AArch64 has no red zone sub sp, sp, #16 // allocate locals // ... body ... add sp, sp, #16 ret

  1. Inline assembly (GCC/Clang)

// Barrier asm volatile ("dmb ish" ::: "memory");

// Load acquire static inline int load_acquire(volatile int *p) { int val; asm volatile ("ldar %w0, %1" : "=r"(val) : "Q"(*p)); return val; }

// Store release static inline void store_release(volatile int *p, int val) { asm volatile ("stlr %w1, %0" : "=Q"(*p) : "r"(val)); }

// Read system counter static inline uint64_t read_cntvct(void) { uint64_t val; asm volatile ("mrs %0, cntvct_el0" : "=r"(val)); return val; }

AArch64-specific constraints:

  • "Q" — memory operand suitable for exclusive/acquire/release instructions

  • "r" — any general-purpose register

  • "w" — any FP/SIMD register

  1. NEON SIMD intrinsics

#include <arm_neon.h>

// Add 4 floats at once float32x4_t a = vld1q_f32(arr_a); // load 4 floats float32x4_t b = vld1q_f32(arr_b); float32x4_t c = vaddq_f32(a, b); vst1q_f32(result, c);

// Horizontal sum float32x4_t sum = vpaddq_f32(c, c); sum = vpaddq_f32(sum, sum); float total = vgetq_lane_f32(sum, 0);

Naming convention: v<op><q>_<type>

  • q suffix: 128-bit (quad) vector

  • _f32 : float32, _s32 : int32, _u8 : uint8, etc.

For a register reference, see references/reference.md.

Related skills

  • Use skills/low-level-programming/assembly-x86 for x86-64 assembly

  • Use skills/compilers/cross-gcc for cross-compilation toolchain

  • Use skills/debuggers/gdb for debugging ARM code with gdbserver

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

cmake

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

static-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

llvm

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

gdb

No summary provided by upstream source.

Repository SourceNeeds Review