deepseek

DeepSeek models on your local fleet — DeepSeek-V3, DeepSeek-V3.2, DeepSeek-R1, DeepSeek-Coder routed across multiple devices via Ollama Herd. 7-signal scoring picks the best machine for every request. Run DeepSeek locally on Apple Silicon with zero cloud costs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "deepseek" with this command: npx skills add twinsgeeks/deepseek-deepseek-v3

DeepSeek — Run DeepSeek Models Across Your Local Fleet

Run DeepSeek-V3, DeepSeek-R1, and DeepSeek-Coder on your own hardware. The fleet router picks the best device for every request — no cloud API needed, zero per-token costs, all data stays on your machines.

Supported DeepSeek models

ModelParametersOllama nameBest for
DeepSeek-V3671B MoE (37B active)deepseek-v3General — matches GPT-4o on most benchmarks
DeepSeek-V3.1671B MoEdeepseek-v3.1Hybrid thinking/non-thinking modes
DeepSeek-V3.2671B MoEdeepseek-v3.2Improved reasoning + agent performance
DeepSeek-R11.5B–671Bdeepseek-r1Reasoning — approaches O3 and Gemini 2.5 Pro
DeepSeek-Coder1.3B–33Bdeepseek-coderCode generation (87% code, 13% NL training)
DeepSeek-Coder-V2236B MoE (21B active)deepseek-coder-v2Code — matches GPT-4 Turbo on code tasks

Setup

pip install ollama-herd
herd              # start the router (port 11435)
herd-node         # run on each machine

# Pull a DeepSeek model
ollama pull deepseek-r1:70b

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Use DeepSeek through the fleet

OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# DeepSeek-R1 for reasoning
response = client.chat.completions.create(
    model="deepseek-r1:70b",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

DeepSeek-Coder for code

response = client.chat.completions.create(
    model="deepseek-coder-v2:16b",
    messages=[{"role": "user", "content": "Write a Redis cache decorator in Python"}],
)
print(response.choices[0].message.content)

Ollama API

# DeepSeek-V3 general chat
curl http://localhost:11435/api/chat -d '{
  "model": "deepseek-v3",
  "messages": [{"role": "user", "content": "Explain quantum computing"}],
  "stream": false
}'

# DeepSeek-R1 reasoning
curl http://localhost:11435/api/chat -d '{
  "model": "deepseek-r1:70b",
  "messages": [{"role": "user", "content": "Solve this step by step: ..."}],
  "stream": false
}'

Hardware recommendations

DeepSeek models are large. Here's what fits where:

ModelMin RAMRecommended hardware
deepseek-r1:1.5b4GBAny Mac
deepseek-r1:7b8GBMac Mini M4 (16GB)
deepseek-r1:14b12GBMac Mini M4 (24GB)
deepseek-r1:32b24GBMac Mini M4 Pro (48GB)
deepseek-r1:70b48GBMac Studio M4 Max (128GB)
deepseek-coder-v2:16b12GBMac Mini M4 (24GB)
deepseek-v3256GB+Mac Studio M3 Ultra (512GB)

The fleet router automatically sends requests to the machine where the model is loaded — no manual routing needed.

Why run DeepSeek locally

  • Zero cost — DeepSeek API charges per token. Local is free after hardware.
  • Privacy — code and business data never leave your network.
  • No rate limits — DeepSeek API throttles during peak hours. Local has no throttle.
  • Availability — DeepSeek API has had outages. Your hardware doesn't depend on their servers.
  • Fleet routing — multiple machines share the load. One busy? Request goes to the next.

Fleet features

  • 7-signal scoring — picks the optimal node for every request
  • Auto-retry — fails over to next best node transparently
  • VRAM-aware fallback — routes to a loaded model in the same category instead of cold-loading
  • Context protection — prevents expensive model reloads from num_ctx changes
  • Request tagging — track per-project DeepSeek usage

Also available on this fleet

Other LLM models

Llama 3.3, Qwen 3.5, Phi 4, Mistral, Gemma 3 — any Ollama model routes through the same endpoint.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "audio=@recording.wav"

Embeddings

curl http://localhost:11435/api/embeddings -d '{"model":"nomic-embed-text","prompt":"query"}'

Dashboard

http://localhost:11435/dashboard — monitor DeepSeek requests alongside all other models. Per-model latency, token throughput, health checks.

Full documentation

Agent Setup Guide

Guardrails

  • Never pull or delete DeepSeek models without user confirmation — downloads are 4-400+ GB.
  • Never delete or modify files in ~/.fleet-manager/.
  • If a DeepSeek model is too large for available memory, suggest a smaller variant.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Deepseek Deepseek Coder

DeepSeek DeepSeek-Coder — run DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder across your local fleet. 7-signal scoring routes every request to the best device. Cro...

Registry SourceRecently Updated
1332Profile unavailable
Coding

Qwen Qwen3

Qwen Qwen3 — run Qwen3.5, Qwen3, Qwen3-Coder, Qwen2.5-Coder, and Qwen3-ASR across your local fleet. LLM inference, code generation, and speech-to-text from A...

Registry SourceRecently Updated
1261Profile unavailable
Coding

Mistral Codestral

Mistral and Codestral — run Mistral Large, Mistral-Nemo, Codestral, and Mistral-Small locally. Mistral AI's open-source LLMs for code generation and reasonin...

Registry SourceRecently Updated
1280Profile unavailable
Coding

Local Llm Router

Local LLM model router for Llama, Qwen, DeepSeek, Phi, Mistral, and Gemma across multiple devices. Self-hosted local LLM inference routing on macOS, Linux, a...

Registry SourceRecently Updated
2360Profile unavailable