self-hosted-ai

Self-hosted AI — run your own LLM inference, image generation, speech-to-text, and embeddings. No cloud APIs, no SaaS subscriptions, no data leaving your network. Self-hosted alternative to OpenAI, DALL-E, Whisper API, and cloud embedding services. Route across macOS, Linux, and Windows machines. 自托管AI本地推理平台。IA autoalojada sin dependencias en la nube.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "self-hosted-ai" with this command: npx skills add twinsgeeks/self-hosted-ai

Self-Hosted AI — Own Your Entire AI Stack

Stop paying per token. Stop sending data to cloud APIs. Run self-hosted LLMs, self-hosted image generation, self-hosted speech-to-text, and self-hosted embeddings on your own hardware. One self-hosted router makes all your devices act like one system.

What self-hosted AI replaces

Cloud serviceSelf-hosted replacementHow
OpenAI APISelf-hosted Llama 3.3, Qwen 3.5, DeepSeek-R1 via OllamaSame OpenAI SDK, swap the base URL
DALL-E / MidjourneySelf-hosted Stable Diffusion 3, Flux via mflux/DiffusionKitPOST /api/generate-image
Whisper APISelf-hosted Qwen3-ASR via MLXPOST /api/transcribe
OpenAI EmbeddingsSelf-hosted nomic-embed-text, mxbai-embed via OllamaPOST /api/embed

Same APIs. Same quality. Zero per-request costs. All data stays on your self-hosted machines.

Self-Hosted Setup

pip install ollama-herd    # Self-hosted AI router from PyPI
herd                       # start the self-hosted router
herd-node                  # run on each self-hosted machine — auto-discovers the router

No Docker. No Kubernetes. No config files. Self-hosted devices find each other automatically on your local network.

Self-Hosted LLM Inference

Drop-in self-hosted replacement for the OpenAI SDK:

from openai import OpenAI

# Self-hosted inference client — replaces OpenAI cloud
self_hosted_client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

self_hosted_response = self_hosted_client.chat.completions.create(
    model="llama3.3:70b",  # self-hosted model, no cloud dependency
    messages=[{"role": "user", "content": "Analyze this contract for risks"}],
    stream=True,
)
for chunk in self_hosted_response:
    print(chunk.choices[0].delta.content or "", end="")

Self-hosted Ollama API

curl http://localhost:11435/api/chat -d '{
  "model": "deepseek-r1:70b",
  "messages": [{"role": "user", "content": "Explain self-hosted AI advantages over cloud APIs"}],
  "stream": false
}'

Self-Hosted Image Generation

Self-hosted replacement for DALL-E and Midjourney:

# Install self-hosted image backends on any node
uv tool install mflux           # Self-hosted Flux models (~7s)
uv tool install diffusionkit    # Self-hosted Stable Diffusion 3/3.5

# Generate on your self-hosted fleet
curl -o self_hosted_output.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model": "z-image-turbo", "prompt": "self-hosted AI generating product mockup", "width": 1024, "height": 1024}'

Self-Hosted Speech-to-Text

Self-hosted replacement for Whisper API:

curl http://localhost:11435/api/transcribe \
  -F "file=@self_hosted_meeting.wav" \
  -F "model=qwen3-asr"

All self-hosted transcription stays on your network. No audio data sent to cloud services.

Self-Hosted Embeddings

Self-hosted replacement for OpenAI's embedding API:

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "self-hosted document embedding for private RAG pipelines"}'

Self-Hosted Cost Comparison

ServiceCloud costSelf-hosted cost
GPT-4o (1M tokens/month)~$15-30/month$0 (self-hosted hardware you own)
DALL-E (1000 images/month)~$40/month$0 (self-hosted image gen)
Whisper API (10 hours audio/month)~$6/month$0 (self-hosted transcription)
OpenAI embeddings (1M tokens/month)~$0.10/month$0 (self-hosted embeddings)
Total~$60+/month$0/month self-hosted

After hardware investment, every self-hosted request is free forever. No rate limits, no usage caps, no surprise bills.

Self-Hosted Advantages

  • Self-hosted data sovereignty — prompts, images, audio, and documents never leave your network
  • Self-hosted throughput — your hardware, no rate limits
  • Self-hosted uptime — cloud API outages don't affect your self-hosted fleet
  • Self-hosted flexibility — switch models instantly, no vendor lock-in
  • Self-hosted compliance — HIPAA, GDPR, SOC2 — no third-party data processors
  • Self-hosted predictability — hardware depreciates, but never surprises you with a bill

Self-Hosted Fleet Routing

The self-hosted router scores each device on 7 signals and picks the best one for every request. Multiple self-hosted machines share the load automatically.

# Self-hosted fleet overview
curl -s http://localhost:11435/fleet/status | python3 -m json.tool

# Self-hosted health checks
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

# Self-hosted model recommendations for your hardware
curl -s http://localhost:11435/dashboard/api/recommendations | python3 -m json.tool

Self-hosted dashboard at http://localhost:11435/dashboard for visual monitoring of your entire self-hosted fleet.

Full self-hosted documentation

Contribute

Ollama Herd is open source (MIT). Self-hosted AI for everyone:

  • Star on GitHub — help others discover self-hosted AI
  • Open an issue — share your self-hosted setup
  • PRs welcome from humans and AI agents. CLAUDE.md gives full self-hosted context. 444 tests.

Self-Hosted Guardrails

  • No automatic downloads — all self-hosted model pulls require explicit user confirmation.
  • Self-hosted model deletion requires explicit user confirmation.
  • All self-hosted requests stay local — no data leaves your network. No telemetry, no analytics, no cloud callbacks.
  • Never delete or modify self-hosted files in ~/.fleet-manager/.
  • Your self-hosted fleet has zero cloud dependencies — works fully offline after initial model downloads.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Private Ai

Private AI — run LLMs, image generation, speech-to-text, and embeddings on your own hardware. Private AI keeps all data on your network. No cloud APIs, no te...

Registry SourceRecently Updated
1360Profile unavailable
General

Ollama Herd

Ollama multimodal model router for Llama, Qwen, DeepSeek, Phi, and Mistral — plus mflux image generation, speech-to-text, and embeddings. Self-hosted Ollama...

Registry SourceRecently Updated
2340Profile unavailable
General

Ollama Ollama Herd

Ollama Ollama Herd — multimodal Ollama model router that herds your Ollama LLMs into one smart Ollama endpoint. Route Ollama Llama, Qwen, DeepSeek, Phi, Mist...

Registry SourceRecently Updated
1522Profile unavailable
Coding

Local Llm Router

Local LLM model router for Llama, Qwen, DeepSeek, Phi, Mistral, and Gemma across multiple devices. Self-hosted local LLM inference routing on macOS, Linux, a...

Registry SourceRecently Updated
2360Profile unavailable