gemma-gemma3

Gemma 3 by Google — run Gemma 3 (4B, 12B, 27B) across your local device fleet. Google's most capable open model with 128K context, strong coding, and multilingual support. Fleet-routed to the best available machine via Ollama Herd. Cross-platform (macOS, Linux, Windows). Zero cloud costs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gemma-gemma3" with this command: npx skills add twinsgeeks/gemma-gemma3

Gemma 3 — Run Google's Open Models Across Your Fleet

Gemma 3 is Google's most capable open-source LLM family. 128K context window, strong coding performance, multilingual support across 140+ languages. The fleet router picks the best device for every request — no manual load balancing.

Supported Gemma models

ModelParametersOllama nameBest for
Gemma 3 27B27Bgemma3:27bHighest quality — rivals much larger models
Gemma 3 12B12Bgemma3:12bBalanced quality and speed
Gemma 3 4B4Bgemma3:4bFast, runs on low-RAM devices
Gemma 3 1B1Bgemma3:1bUltra-light, instant responses
CodeGemma 7B7BcodegemmaCode-focused variant

Quick start

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/
herd                       # start the router (port 11435)
herd-node                  # run on each device — finds the router automatically

No models are downloaded during installation. Models are pulled on demand when a request arrives, or manually via the dashboard. All pulls require user confirmation.

Use Gemma through the fleet

OpenAI SDK (drop-in replacement)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# Gemma 3 27B for complex reasoning
response = client.chat.completions.create(
    model="gemma3:27b",
    messages=[{"role": "user", "content": "Explain quantum entanglement to a 10-year-old"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Code generation with CodeGemma

response = client.chat.completions.create(
    model="codegemma",
    messages=[{"role": "user", "content": "Write a binary search tree in Rust with insert, delete, and search"}],
)
print(response.choices[0].message.content)

curl (Ollama format)

# Gemma 3 27B
curl http://localhost:11435/api/chat -d '{
  "model": "gemma3:27b",
  "messages": [{"role": "user", "content": "Translate to Japanese: The weather is beautiful today"}],
  "stream": false
}'

curl (OpenAI format)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma3:4b", "messages": [{"role": "user", "content": "Hello"}]}'

Which Gemma for your hardware

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

DeviceRAMBest Gemma model
MacBook Air (8GB)8GBgemma3:1b — instant responses
Mac Mini (16GB)16GBgemma3:4b — strong for its size
Mac Mini (24GB)24GBgemma3:12b — great balance
MacBook Pro (36GB)36GBgemma3:27b — full power
Mac Studio (64GB+)64GB+gemma3:27b + codegemma simultaneously

Why Gemma locally

  • 128K context — process entire codebases and long documents
  • 140+ languages — multilingual without switching models
  • Google quality, zero cost — no per-token charges after hardware
  • Privacy — all data stays on your network
  • Fleet routing — multiple machines share the load

Check what's running

# Models loaded in memory
curl -s http://localhost:11435/api/ps | python3 -m json.tool

# Fleet health
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Web dashboard at http://localhost:11435/dashboard — live monitoring.

Also available on this fleet

Other LLMs

Llama 3.3, Qwen 3.5, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Codestral — same endpoint.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "a gemstone catching light", "width": 1024, "height": 1024}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "file=@meeting.wav" -F "model=qwen3-asr"

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Google Gemma open source language model"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). Stars, issues, and PRs welcome — from humans and AI agents alike:

  • GitHub — 444 tests, fully async, CLAUDE.md makes AI agents productive instantly
  • Found a bug? Open an issue
  • Want to add a feature? Fork, branch, PR — the test suite runs in under 40 seconds

Guardrails

  • Model downloads require explicit user confirmation — Gemma models range from 1GB (1B) to 16GB (27B).
  • Model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in via auto_pull.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Llama Llama3

Llama 3 by Meta — run Llama 3.3, Llama 3.2, and Llama 3.1 across your local device fleet. The most popular open-source LLM family routed to the best availabl...

Registry SourceRecently Updated
1672Profile unavailable
Coding

Mistral Codestral

Mistral and Codestral — run Mistral Large, Mistral-Nemo, Codestral, and Mistral-Small locally. Mistral AI's open-source LLMs for code generation and reasonin...

Registry SourceRecently Updated
1370Profile unavailable
Coding

Local Llm Router

Local LLM model router for Llama, Qwen, DeepSeek, Phi, Mistral, and Gemma across multiple devices. Self-hosted local LLM inference routing on macOS, Linux, a...

Registry SourceRecently Updated
2470Profile unavailable
General

Ollama Herd

Ollama multimodal model router for Llama, Qwen, DeepSeek, Phi, and Mistral — plus mflux image generation, speech-to-text, and embeddings. Self-hosted Ollama...

Registry SourceRecently Updated
2460Profile unavailable