llm-training

LLM Training

Frameworks and techniques for training and finetuning large language models.

Framework Comparison

Framework Best For Multi-GPU Memory Efficient

Accelerate Simple distributed Yes Basic

DeepSpeed Large models, ZeRO Yes Excellent

PyTorch Lightning Clean training loops Yes Good

Ray Train Scalable, multi-node Yes Good

TRL RLHF, reward modeling Yes Good

Unsloth Fast LoRA finetuning Limited Excellent

Accelerate (HuggingFace)

Minimal wrapper for distributed training. Run accelerate config for interactive setup.

Key concept: Wrap model, optimizer, dataloader with accelerator.prepare() , use accelerator.backward() for loss.

DeepSpeed (Large Models)

Microsoft's optimization library for training massive models.

ZeRO Stages:

Stage 1: Optimizer states partitioned across GPUs
Stage 2: + Gradients partitioned
Stage 3: + Parameters partitioned (for largest models, 100B+)

Key concept: Configure via JSON, higher stages = more memory savings but more communication overhead.

TRL (RLHF/DPO)

HuggingFace library for reinforcement learning from human feedback.

Training types:

SFT (Supervised Finetuning): Standard instruction tuning
DPO (Direct Preference Optimization): Simpler than RLHF, uses preference pairs
PPO: Classic RLHF with reward model

Key concept: DPO is often preferred over PPO - simpler, no reward model needed, just chosen/rejected response pairs.

Unsloth (Fast LoRA)

Optimized LoRA finetuning - 2x faster, 60% less memory.

Key concept: Drop-in replacement for standard LoRA with automatic optimizations. Best for 7B-13B models.

Memory Optimization Techniques

Technique Memory Savings Trade-off

Gradient checkpointing ~30-50% Slower training

Mixed precision (fp16/bf16) ~50% Minor precision loss

4-bit quantization (QLoRA) ~75% Some quality loss

Flash Attention ~20-40% Requires compatible GPU

Gradient accumulation Effective batch↑ No memory cost

Decision Guide

Scenario Recommendation

Simple finetuning Accelerate + PEFT

7B-13B models Unsloth (fastest)

70B+ models DeepSpeed ZeRO-3

RLHF/DPO alignment TRL

Multi-node cluster Ray Train

Clean code structure PyTorch Lightning

Resources

Accelerate: https://huggingface.co/docs/accelerate
DeepSpeed: https://www.deepspeed.ai/
TRL: https://huggingface.co/docs/trl
Unsloth: https://github.com/unslothai/unsloth

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

document-processing

stripe-payments

file-organization

literature-review