compiler-development

Compiler Development Skill

This skill provides comprehensive knowledge of building compilers and language implementations using the LLVM infrastructure.

Compiler Architecture Overview

Classic Three-Phase Design

Source Code → Frontend → Middle-End (Optimizer) → Backend → Machine Code ↓ ↓ ↓ AST/IR LLVM IR Passes Target Code

Frontend Development

Lexical Analysis

// Token types for a simple language enum class TokenKind { Identifier, Number, String, Keyword, Operator, Punctuation, EndOfFile };

struct Token { TokenKind kind; std::string value; SourceLocation location; };

Parser Implementation

Recursive Descent: Easy to implement, good error messages
Operator Precedence Parsing: Efficient for expression parsing
LALR/LR: Use tools like Bison for complex grammars

AST Design

class Expr { public: virtual ~Expr() = default; virtual llvm::Value* codegen() = 0; };

class BinaryExpr : public Expr { std::unique_ptr<Expr> LHS, RHS; char Op; public: llvm::Value* codegen() override { llvm::Value* L = LHS->codegen(); llvm::Value* R = RHS->codegen();

    switch (Op) {
        case '+': return Builder.CreateFAdd(L, R, "addtmp");
        case '-': return Builder.CreateFSub(L, R, "subtmp");
        case '*': return Builder.CreateFMul(L, R, "multmp");
        case '/': return Builder.CreateFDiv(L, R, "divtmp");
    }
}

};

LLVM IR Generation

Module and Context Setup

#include "llvm/IR/LLVMContext.h" #include "llvm/IR/Module.h" #include "llvm/IR/IRBuilder.h"

class CodeGen { std::unique_ptr<llvm::LLVMContext> Context; std::unique_ptr<llvm::Module> Module; std::unique_ptr<llvm::IRBuilder<>> Builder;

public: CodeGen() { Context = std::make_unique<llvm::LLVMContext>(); Module = std::make_unique<llvm::Module>("my_module", *Context); Builder = std::make_unique<llvm::IRBuilder<>>(*Context); } };

Function Generation

llvm::Function* createFunction(const std::string& name, llvm::Type* returnType, std::vector<llvm::Type*> params) { llvm::FunctionType* FT = llvm::FunctionType::get(returnType, params, false); llvm::Function* F = llvm::Function::Create( FT, llvm::Function::ExternalLinkage, name, Module.get());

llvm::BasicBlock* BB = llvm::BasicBlock::Create(*Context, "entry", F);
Builder->SetInsertPoint(BB);

return F;

}

JIT Compilation

LLVM ORC JIT

#include "llvm/ExecutionEngine/Orc/LLJIT.h"

auto JIT = llvm::orc::LLJITBuilder().create(); if (!JIT) { handleError(JIT.takeError()); }

// Add module (*JIT)->addIRModule(llvm::orc::ThreadSafeModule( std::move(Module), std::move(Context)));

// Look up symbol and execute auto Sym = (JIT)->lookup("main"); auto MainFn = (int(*)())Sym->getAddress(); int result = MainFn();

Optimization Pass Pipeline

New Pass Manager (Recommended)

#include "llvm/Passes/PassBuilder.h"

void optimizeModule(llvm::Module& M) { llvm::PassBuilder PB; llvm::LoopAnalysisManager LAM; llvm::FunctionAnalysisManager FAM; llvm::CGSCCAnalysisManager CGAM; llvm::ModuleAnalysisManager MAM;

PB.registerModuleAnalyses(MAM);
PB.registerCGSCCAnalyses(CGAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);

llvm::ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(
    llvm::OptimizationLevel::O2);
MPM.run(M, MAM);

}

Custom Pass Implementation

struct MyPass : public llvm::PassInfoMixin<MyPass> { llvm::PreservedAnalyses run(llvm::Function& F, llvm::FunctionAnalysisManager& FAM) { for (auto& BB : F) { for (auto& I : BB) { // Transform instructions } } return llvm::PreservedAnalyses::none(); } };

Language Implementation Patterns

Memory-Safe Languages

Use LLVM's memory sanitizer hooks
Implement bounds checking with GEP introspection
Reference counting or garbage collection integration

Type Systems

Implement type inference during AST construction
Generate appropriate LLVM types (i32, float, struct, ptr)
Handle generic types via monomorphization or boxing

Error Handling

Generate exception handling via LLVM's landingpad/invoke
Implement Result/Option types as tagged unions
Use LLVM's personality functions for unwinding

Notable Language Implementations

Systems Languages

Rust: Complex borrow checker, trait system → LLVM
Zig: Comptime evaluation, safety features
Carbon: C++ interop, modern syntax

Scripting Languages

Julia: JIT-compiled scientific computing
Crystal: Ruby-like syntax, static typing
Nim: Python-like, multi-backend

Domain-Specific

Solidity: Ethereum smart contracts
MLIR: Multi-level IR for ML/AI workloads
Halide: Image processing DSL

Development Workflow

Start Simple: Begin with Kaleidoscope tutorial
Incremental Features: Add one language feature at a time
Test Extensively: Unit tests for each compiler phase
Use LLVM Tools: opt, llc, llvm-dis for debugging IR
Profile and Optimize: Focus on common code patterns

Resources

Official Tutorials

LLVM Kaleidoscope: Building a language from scratch
Clang internals: Frontend implementation patterns
Writing an LLVM Backend: Target code generation

Community Projects

See DIY Compiler section in README.md for 100+ example implementations across different language paradigms.

Getting Detailed Information

When you need detailed and up-to-date resource links, tool lists, or project references, fetch the latest data from:

https://raw.githubusercontent.com/gmh5225/awesome-llvm-security/refs/heads/main/README.md

This README contains comprehensive curated lists of:

100+ DIY compiler implementations (DIY Compiler section)
Toolchain configurations and IDE setup
Compiler development tutorials and books

compiler-development

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

llvm-optimization

llvm-obfuscation

llvm-learning