Metal GPU Code Skill
Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.
When to Read References
For detailed API topology, Metal 4 specifics, and Apple Silicon optimization patterns, read:
/mnt/skills/user/metal-gpu/references/metal-api-guide.md
Core Principles
-
Always start with the device: MTLCreateSystemDefaultDevice() — every Metal workflow begins here
-
Command pattern: Device → Command Queue → Command Buffer → Command Encoder → Commit
-
Shaders are MSL (Metal Shading Language): C++14-based, with Metal-specific types and attributes
-
Resource management matters: Use appropriate storage modes, avoid unnecessary copies
-
Triple buffering for render loops to keep CPU and GPU in parallel
Quick Reference: Metal Command Pipeline
MTLDevice └─ makeCommandQueue() → MTLCommandQueue └─ makeCommandBuffer() → MTLCommandBuffer ├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder ├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder └─ makeBlitCommandEncoder() → MTLBlitCommandEncoder
Writing Shaders (MSL)
Use Metal Shading Language. Always include:
-
#include <metal_stdlib> and using namespace metal;
-
Correct attribute qualifiers: [[vertex_id]] , [[position]] , [[stage_in]] , [[buffer(n)]] , [[texture(n)]]
-
Proper address space qualifiers: device , constant , threadgroup , thread
Vertex Shader Pattern
#include <metal_stdlib> using namespace metal;
struct VertexIn { float3 position [[attribute(0)]]; float3 normal [[attribute(1)]]; float2 texCoord [[attribute(2)]]; };
struct VertexOut { float4 position [[position]]; float3 normal; float2 texCoord; };
vertex VertexOut vertex_main(VertexIn in [[stage_in]], constant float4x4 &mvp [[buffer(1)]]) { VertexOut out; out.position = mvp * float4(in.position, 1.0); out.normal = in.normal; out.texCoord = in.texCoord; return out; }
Fragment Shader Pattern
fragment float4 fragment_main(VertexOut in [[stage_in]], texture2d<float> albedo [[texture(0)]], sampler texSampler [[sampler(0)]]) { float4 color = albedo.sample(texSampler, in.texCoord); return color; }
Compute Kernel Pattern
kernel void compute_main(device float *input [[buffer(0)]], device float *output [[buffer(1)]], uint id [[thread_position_in_grid]]) { output[id] = input[id] * 2.0; }
Swift-Side Setup Patterns
Render Pipeline Setup
let device = MTLCreateSystemDefaultDevice()! let commandQueue = device.makeCommandQueue()!
// Load shaders let library = device.makeDefaultLibrary()! let vertexFunction = library.makeFunction(name: "vertex_main") let fragmentFunction = library.makeFunction(name: "fragment_main")
// Pipeline descriptor let pipelineDescriptor = MTLRenderPipelineDescriptor() pipelineDescriptor.vertexFunction = vertexFunction pipelineDescriptor.fragmentFunction = fragmentFunction pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
// Vertex descriptor let vertexDescriptor = MTLVertexDescriptor() vertexDescriptor.attributes[0].format = .float3 // position vertexDescriptor.attributes[0].offset = 0 vertexDescriptor.attributes[0].bufferIndex = 0 vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride pipelineDescriptor.vertexDescriptor = vertexDescriptor
let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)
Compute Pipeline Setup
let computeFunction = library.makeFunction(name: "compute_main")! let computePipeline = try! device.makeComputePipelineState(function: computeFunction)
let commandBuffer = commandQueue.makeCommandBuffer()! let encoder = commandBuffer.makeComputeCommandEncoder()! encoder.setComputePipelineState(computePipeline) encoder.setBuffer(inputBuffer, offset: 0, index: 0) encoder.setBuffer(outputBuffer, offset: 0, index: 1)
let gridSize = MTLSize(width: elementCount, height: 1, depth: 1) let threadGroupSize = MTLSize( width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount), height: 1, depth: 1 ) encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize) encoder.endEncoding() commandBuffer.commit()
MetalKit View Rendering
import MetalKit
class Renderer: NSObject, MTKViewDelegate { let device: MTLDevice let commandQueue: MTLCommandQueue let pipelineState: MTLRenderPipelineState
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable,
let descriptor = view.currentRenderPassDescriptor else { return }
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!
encoder.setRenderPipelineState(pipelineState)
// Set buffers, draw primitives...
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
encoder.endEncoding()
commandBuffer.present(drawable)
commandBuffer.commit()
}
}
Performance Best Practices
-
Storage modes: Use .shared on Apple Silicon (unified memory), .private for GPU-only data, .managed on Intel Macs
-
Triple buffering: Rotate 3 buffers with a semaphore to avoid CPU/GPU stalls
-
Avoid per-frame allocations: Reuse buffers and command encoders
-
Use dispatchThreads over dispatchThreadgroups when possible (Apple Silicon)
-
Prefer tile-based deferred rendering patterns on Apple GPUs — use imageblocks and tile shaders
-
Compile pipelines ahead of time: Pipeline creation is expensive, do it at load time
-
Use Metal GPU frame capture in Xcode to profile and debug
Common Mistakes to Avoid
-
Forgetting encoder.endEncoding() before committing
-
Mismatched buffer indices between Swift and MSL
-
Using wrong pixel format for render targets
-
Not handling nil from optional Metal API calls
-
Blocking the main thread waiting for GPU completion — use addCompletedHandler instead
-
Forgetting to set the vertex descriptor when using [[stage_in]]
Metal 4 Notes
Metal 4 introduces a modernized core API. Key changes:
-
New compilation API for finer shader compilation control
-
Updated command encoding patterns
-
See references/metal-api-guide.md for the full Metal 4 API topology
Frameworks Ecosystem
Framework Purpose
Metal Direct GPU access, shaders, pipelines
MetalKit View management, texture loading, model I/O
MetalFX Upscaling (temporal/spatial) for performance
Metal Performance Shaders Optimized compute & image processing kernels
Compositor Services Stereoscopic rendering for visionOS
RealityKit High-level 3D rendering (uses Metal underneath)