metal-gpu

Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "metal-gpu" with this command: npx skills add ios-agent/iosagent.dev/ios-agent-iosagent-dev-metal-gpu

Metal GPU Code Skill

Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.

When to Read References

For detailed API topology, Metal 4 specifics, and Apple Silicon optimization patterns, read:

/mnt/skills/user/metal-gpu/references/metal-api-guide.md

Core Principles

  • Always start with the device: MTLCreateSystemDefaultDevice() — every Metal workflow begins here

  • Command pattern: Device → Command Queue → Command Buffer → Command Encoder → Commit

  • Shaders are MSL (Metal Shading Language): C++14-based, with Metal-specific types and attributes

  • Resource management matters: Use appropriate storage modes, avoid unnecessary copies

  • Triple buffering for render loops to keep CPU and GPU in parallel

Quick Reference: Metal Command Pipeline

MTLDevice └─ makeCommandQueue() → MTLCommandQueue └─ makeCommandBuffer() → MTLCommandBuffer ├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder ├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder └─ makeBlitCommandEncoder() → MTLBlitCommandEncoder

Writing Shaders (MSL)

Use Metal Shading Language. Always include:

  • #include <metal_stdlib> and using namespace metal;

  • Correct attribute qualifiers: [[vertex_id]] , [[position]] , [[stage_in]] , [[buffer(n)]] , [[texture(n)]]

  • Proper address space qualifiers: device , constant , threadgroup , thread

Vertex Shader Pattern

#include <metal_stdlib> using namespace metal;

struct VertexIn { float3 position [[attribute(0)]]; float3 normal [[attribute(1)]]; float2 texCoord [[attribute(2)]]; };

struct VertexOut { float4 position [[position]]; float3 normal; float2 texCoord; };

vertex VertexOut vertex_main(VertexIn in [[stage_in]], constant float4x4 &mvp [[buffer(1)]]) { VertexOut out; out.position = mvp * float4(in.position, 1.0); out.normal = in.normal; out.texCoord = in.texCoord; return out; }

Fragment Shader Pattern

fragment float4 fragment_main(VertexOut in [[stage_in]], texture2d<float> albedo [[texture(0)]], sampler texSampler [[sampler(0)]]) { float4 color = albedo.sample(texSampler, in.texCoord); return color; }

Compute Kernel Pattern

kernel void compute_main(device float *input [[buffer(0)]], device float *output [[buffer(1)]], uint id [[thread_position_in_grid]]) { output[id] = input[id] * 2.0; }

Swift-Side Setup Patterns

Render Pipeline Setup

let device = MTLCreateSystemDefaultDevice()! let commandQueue = device.makeCommandQueue()!

// Load shaders let library = device.makeDefaultLibrary()! let vertexFunction = library.makeFunction(name: "vertex_main") let fragmentFunction = library.makeFunction(name: "fragment_main")

// Pipeline descriptor let pipelineDescriptor = MTLRenderPipelineDescriptor() pipelineDescriptor.vertexFunction = vertexFunction pipelineDescriptor.fragmentFunction = fragmentFunction pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm

// Vertex descriptor let vertexDescriptor = MTLVertexDescriptor() vertexDescriptor.attributes[0].format = .float3 // position vertexDescriptor.attributes[0].offset = 0 vertexDescriptor.attributes[0].bufferIndex = 0 vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride pipelineDescriptor.vertexDescriptor = vertexDescriptor

let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)

Compute Pipeline Setup

let computeFunction = library.makeFunction(name: "compute_main")! let computePipeline = try! device.makeComputePipelineState(function: computeFunction)

let commandBuffer = commandQueue.makeCommandBuffer()! let encoder = commandBuffer.makeComputeCommandEncoder()! encoder.setComputePipelineState(computePipeline) encoder.setBuffer(inputBuffer, offset: 0, index: 0) encoder.setBuffer(outputBuffer, offset: 0, index: 1)

let gridSize = MTLSize(width: elementCount, height: 1, depth: 1) let threadGroupSize = MTLSize( width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount), height: 1, depth: 1 ) encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize) encoder.endEncoding() commandBuffer.commit()

MetalKit View Rendering

import MetalKit

class Renderer: NSObject, MTKViewDelegate { let device: MTLDevice let commandQueue: MTLCommandQueue let pipelineState: MTLRenderPipelineState

func draw(in view: MTKView) {
    guard let drawable = view.currentDrawable,
          let descriptor = view.currentRenderPassDescriptor else { return }

    let commandBuffer = commandQueue.makeCommandBuffer()!
    let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!

    encoder.setRenderPipelineState(pipelineState)
    // Set buffers, draw primitives...
    encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)

    encoder.endEncoding()
    commandBuffer.present(drawable)
    commandBuffer.commit()
}

}

Performance Best Practices

  • Storage modes: Use .shared on Apple Silicon (unified memory), .private for GPU-only data, .managed on Intel Macs

  • Triple buffering: Rotate 3 buffers with a semaphore to avoid CPU/GPU stalls

  • Avoid per-frame allocations: Reuse buffers and command encoders

  • Use dispatchThreads over dispatchThreadgroups when possible (Apple Silicon)

  • Prefer tile-based deferred rendering patterns on Apple GPUs — use imageblocks and tile shaders

  • Compile pipelines ahead of time: Pipeline creation is expensive, do it at load time

  • Use Metal GPU frame capture in Xcode to profile and debug

Common Mistakes to Avoid

  • Forgetting encoder.endEncoding() before committing

  • Mismatched buffer indices between Swift and MSL

  • Using wrong pixel format for render targets

  • Not handling nil from optional Metal API calls

  • Blocking the main thread waiting for GPU completion — use addCompletedHandler instead

  • Forgetting to set the vertex descriptor when using [[stage_in]]

Metal 4 Notes

Metal 4 introduces a modernized core API. Key changes:

  • New compilation API for finer shader compilation control

  • Updated command encoding patterns

  • See references/metal-api-guide.md for the full Metal 4 API topology

Frameworks Ecosystem

Framework Purpose

Metal Direct GPU access, shaders, pipelines

MetalKit View management, texture loading, model I/O

MetalFX Upscaling (temporal/spatial) for performance

Metal Performance Shaders Optimized compute & image processing kernels

Compositor Services Stereoscopic rendering for visionOS

RealityKit High-level 3D rendering (uses Metal underneath)

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

swiftui

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

widgetkit

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

carplay

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mapkit

No summary provided by upstream source.

Repository SourceNeeds Review