mac-mini-ai

Mac Mini AI — run LLMs, image generation, speech-to-text, and embeddings on your Mac Mini. M4 (16-32GB) and M4 Pro (24-64GB) configurations make the Mac Mini the most affordable entry point for local AI. Stack multiple Mac Minis into a fleet for the cost of one cloud GPU. Route requests across all your Mac Minis automatically.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mac-mini-ai" with this command: npx skills add twinsgeeks/mac-mini-ai

Mac Mini AI — The $599 AI Node

The Mac Mini is the most cost-effective hardware for local AI. Starting at $599 with 16GB of unified memory, it runs 7B-14B models comfortably. Stack three Mac Minis for the cost of one month of cloud GPU rental — and they run forever with zero ongoing costs.

This skill turns one Mac Mini into an AI server and multiple Mac Minis into a fleet.

Mac Mini configurations for AI

ConfigChipUnified MemoryPriceLLM Sweet Spot
Mac Mini M4 (16GB)M416GB$5993B-7B models (phi4-mini, llama3.2:3b)
Mac Mini M4 (24GB)M424GB$7997B-14B models (phi4, gemma3:12b)
Mac Mini M4 (32GB)M432GB$99914B-22B models (qwen3:14b, codestral)
Mac Mini M4 Pro (48GB)M4 Pro48GB$1,39922B-32B models (qwen3:32b)
Mac Mini M4 Pro (64GB)M4 Pro64GB$1,79932B-70B models (llama3.3:70b quantized)

The Mac Mini fleet strategy

Three Mac Minis (32GB each) for $3,000 give you:

  • 96GB total unified memory across the fleet
  • Each runs a different model simultaneously
  • The router picks the best device for every request
  • $0/month after purchase — no cloud API costs
Mac Mini #1 (32GB) — llama3.3:70b (quantized)  ─┐
Mac Mini #2 (32GB) — codestral + phi4            ├──→  Router  ←──  Your apps
Mac Mini #3 (32GB) — qwen3:14b + embeddings     ─┘

Setup

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/

On one Mac Mini (the router):

herd

On every other Mac Mini:

herd-node

Devices discover each other automatically. No IP configuration, no Docker, no Kubernetes.

Use your Mac Mini

Chat with an LLM

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="phi4",
    messages=[{"role": "user", "content": "Write a Python web scraper"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Ollama API

curl http://localhost:11435/api/chat -d '{
  "model": "gemma3:12b",
  "messages": [{"role": "user", "content": "Explain recursion simply"}],
  "stream": false
}'

Image generation (optional)

uv tool install mflux    # Install on any Mac Mini
curl -o art.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model": "z-image-turbo", "prompt": "a stack of Mac Minis glowing", "width": 512, "height": 512}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "file=@meeting.wav" -F "model=qwen3-asr"

Embeddings for RAG

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Mac Mini home server local AI"}'

Best models for Mac Mini

RAMBest modelsWhy
16GBphi4-mini (3.8B), gemma3:4b, nomic-embed-textSmall but capable, leaves room for OS
24GBphi4 (14B), gemma3:12b, codestralSweet spot for single-model use
32GBqwen3:14b, deepseek-r1:14b, codestral + phi4-miniTwo models simultaneously
48GBqwen3:32b, deepseek-r1:32bLarger models, great quality
64GBllama3.3:70b (quantized)Near-frontier quality on a Mac Mini

Monitor your Mac Mini fleet

Dashboard at http://localhost:11435/dashboard — see every Mac Mini's status, loaded models, and queue depths.

# Fleet overview
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Model recommendations for your hardware
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Works with any OpenAI-compatible tool

ToolConnection
Open WebUIOllama URL: http://mac-mini-ip:11435
Aideraider --openai-api-base http://mac-mini-ip:11435/v1
Continue.devBase URL: http://mac-mini-ip:11435/v1
LangChainChatOpenAI(base_url="http://mac-mini-ip:11435/v1")

Full documentation

Contribute

Ollama Herd is open source (MIT). Built for the Mac Mini fleet community:

  • Star on GitHub — help other Mac Mini owners find us
  • Open an issue — share your Mac Mini fleet setup
  • PRs welcome from humans and AI agents. CLAUDE.md gives full context.
  • Running a Mac Mini cluster? We'd love to hear about it.

Guardrails

  • No automatic downloads — model pulls require explicit user confirmation.
  • Model deletion requires explicit user confirmation.
  • All requests stay local — no data leaves your network.
  • Never delete or modify files in ~/.fleet-manager/.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Ollama Ollama Herd

Ollama Ollama Herd — multimodal Ollama model router that herds your Ollama LLMs into one smart Ollama endpoint. Route Ollama Llama, Qwen, DeepSeek, Phi, Mist...

Registry SourceRecently Updated
1542Profile unavailable
Coding

Homelab Ai

Home lab AI — turn your spare machines into a local AI home lab cluster. LLM inference, image generation, speech-to-text, and embeddings across macOS, Linux,...

Registry SourceRecently Updated
1410Profile unavailable
Coding

Gpu Cluster Manager

Turn your spare GPUs into one inference endpoint. Auto-discovers machines on your network, routes requests to the best available device, learns when your mac...

Registry SourceRecently Updated
2170Profile unavailable
Coding

Apple Silicon Ai

Apple Silicon AI — run LLMs, image generation, speech-to-text, and embeddings on Mac Studio, Mac Mini, MacBook Pro, and Mac Pro. Turn your Apple Silicon devi...

Registry SourceRecently Updated
1402Profile unavailable