AMD/ROCm Docker Environment Probe
Run this before writing any optimization code. AMD Docker images silently set framework defaults that differ from stock PyTorch. These hidden defaults cause stalls, crashes, and wrong results that are impossible to diagnose by looking at code alone.
Why This Exists
Problem: ROCm Docker images override PyTorch/Triton defaults at the system level. For example,
max_autotune=True as a global default means torch.compile(mode="default") benchmarks every
GEMM across ATEN+TRITON+CPP backends. With hundreds of matmuls in a compiled graph, autotuning
never finishes — the process hangs indefinitely with no error message.
These defaults are invisible to pip list, rocm-smi, or any surface-level inspection. You have
to introspect the framework config objects at runtime to see them.
How to Use
Step 1: Run the probe script
python /path/to/env_probe.py
Or if the skill is installed as a Claude Code command, copy the probe script from references/env_probe.py and run it inside your Docker container.
The probe script is self-contained — no dependencies beyond PyTorch (which your Docker already has).
Step 2: Read the output
The probe outputs a structured report with three severity levels:
| Level | Meaning | Action |
|---|---|---|
| CRITICAL | Will cause hangs, crashes, or silent wrong results | Must fix before proceeding |
| WARNING | Suboptimal default, will hurt performance | Fix before benchmarking |
| INFO | Informational, no action needed | Document for reproducibility |
Step 3: Apply fixes
Each CRITICAL/WARNING item includes a recommended fix — either a Python config line or an
environment variable to set. Apply these fixes at the top of your script, before any
torch.compile() or torch.cuda.CUDAGraph() call.
What the Probe Checks
Category 1: Surface Facts (versions, hardware)
- Python version, PyTorch version, Triton version
- ROCm version, GPU architecture (gfx target)
- AITER, Composable Kernel, flash-attn availability and versions
- hipBLASLt availability
Category 2: Runtime Behavior Defaults (the hidden landmines)
torch._inductor.config.max_autotune— if True, causes indefinite stall with torch.compiletorch._inductor.config.max_autotune_gemm_backends— which backends inductor will benchmarktorch._inductor.config.triton.cudagraphs— unstable on ROCmtorch._inductor.config.triton.cudagraph_trees— unstable on ROCmtorch._inductor.config.memory_planning— causes deep recursion crash on ROCmtorch._dynamo.config.cache_size_limit— too small causes recompilation loopstorch.backends.cudnn.benchmarkandallow_tf32defaults
Category 3: Known Bug Markers
- hipBLASLt solver discovery (HIPBLAS_STATUS_NOT_INITIALIZED)
- FP8 flash attention availability
- gfx950/gfx942 ASM GEMM kernel availability
- AITER function signatures (argument combos that were broken in older versions)
Category 4: Environment Variables
HIP_VISIBLE_DEVICES,ROCR_VISIBLE_DEVICESHSA_ENABLE_SDMA,HIP_FORCE_DEV_KERNARGPYTORCH_TUNABLEOP_ENABLED,PYTORCH_TUNABLEOP_TUNINGTORCH_COMPILE_DEBUG,TORCHINDUCTOR_*overrides
Recommended Inductor Configuration for ROCm
When the probe flags inductor defaults as CRITICAL, apply this configuration block before any
torch.compile() call:
import torch._inductor.config as inductor_config
import torch._dynamo.config as dynamo_config
# Prevent indefinite GEMM autotuning stall
inductor_config.max_autotune = False
inductor_config.max_autotune_gemm_backends = "ATEN"
# Disable unstable triton cudagraphs on ROCm
inductor_config.triton.cudagraphs = False
inductor_config.triton.cudagraph_trees = False
# Prevent deep recursion crash
inductor_config.memory_planning = False
# Prevent cache eviction / recompilation loops
dynamo_config.cache_size_limit = 128
See references/inductor-rocm-defaults.md for the full explanation of each setting and when you might want to override them.
Integration with Other Skills
- amd-rocm-porting: Run env-probe as Phase 0.5 (after Phase 0 environment setup, before Phase 1 porting)
- amd-inference-optimization: Run env-probe before Phase 0 profiling baseline
- rocprofv3-profiler: Probe checks that rocprofv3 is available and functional
Adding New Checks
When you discover a new Docker-specific gotcha, add it to references/env_probe.py:
- Add the check function
- Add it to the appropriate category in
run_all_checks() - Include the severity level (CRITICAL/WARNING/INFO) and recommended fix
- Document the failure mode (what happens if the agent doesn't know about this)
This skill is meant to grow — every experiment that hits an environment issue should contribute a new check back to the probe.