qwen-qwen3-5

Qwen 3.5 by Alibaba — run Qwen 3.5 (the latest and most capable Qwen model) across your local device fleet. Qwen 3.5 rivals GPT-4o and Claude 3.5 on reasoning benchmarks. Plus Qwen3-Coder for code generation and Qwen3-ASR for speech-to-text. Fleet-routed to the best available machine via Ollama Herd. Zero cloud costs.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "qwen-qwen3-5" with this command: npx skills add twinsgeeks/qwen-qwen3-5

Qwen 3.5 — Alibaba's Latest LLM on Your Local Fleet

Qwen 3.5 is the newest and most capable model in the Qwen family. It rivals GPT-4o and Claude 3.5 Sonnet on reasoning, coding, and multilingual benchmarks — and you can run it locally on your own hardware for free.

Supported Qwen models

ModelParametersOllama nameBest for
Qwen 3.572Bqwen3.5Frontier reasoning — rivals GPT-4o
Qwen 3.532Bqwen3.5:32bStrong quality at lower resource cost
Qwen 3.514Bqwen3.5:14bGood balance for mid-range hardware
Qwen 3.57Bqwen3.5:7bFast on low-RAM devices
Qwen3-Coder32Bqwen3-coder:32bCode generation — 80+ languages
Qwen2.5-Coder7B, 32Bqwen2.5-coder:32bProven code model
Qwen3-ASRqwen3-asrSpeech-to-text transcription

Quick start

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/
herd                       # start the router (port 11435)
herd-node                  # run on each device — finds the router automatically

No models are downloaded during installation. Models are pulled on demand. All pulls require user confirmation.

Use Qwen 3.5 through the fleet

OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# Qwen 3.5 for complex reasoning
response = client.chat.completions.create(
    model="qwen3.5",
    messages=[{"role": "user", "content": "Compare microservices vs monolith architectures"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Qwen3-Coder for code

response = client.chat.completions.create(
    model="qwen3-coder:32b",
    messages=[{"role": "user", "content": "Write a thread-safe connection pool in Go"}],
)
print(response.choices[0].message.content)

Ollama API

# Qwen 3.5 chat
curl http://localhost:11435/api/chat -d '{
  "model": "qwen3.5",
  "messages": [{"role": "user", "content": "Explain attention mechanisms"}],
  "stream": false
}'

Qwen3-ASR speech-to-text

curl http://localhost:11435/api/transcribe \
  -F "file=@meeting.wav" \
  -F "model=qwen3-asr"

Hardware recommendations

Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

DeviceRAMBest Qwen model
Mac Mini (16GB)16GBqwen3.5:7b
Mac Mini (32GB)32GBqwen3.5:14b or qwen2.5-coder:32b
MacBook Pro (64GB)64GBqwen3.5:32b or qwen3-coder:32b
Mac Studio (128GB)128GBqwen3.5 (72B) — full quality
Mac Studio (256GB)256GBqwen3.5 + qwen3-coder:32b simultaneously

Why Qwen 3.5 locally

  • GPT-4o quality — Qwen 3.5 72B matches GPT-4o on MMLU, HumanEval, and MT-Bench
  • Zero cost — no per-token charges after hardware
  • Privacy — all data stays on your network
  • No rate limits — Qwen's cloud API throttles during peak hours. Your hardware doesn't.
  • Fleet routing — multiple machines share the load

Also available on this fleet

Other LLMs

Llama 3.3, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Gemma 3, Codestral — same endpoint.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "an AI assistant helping with code", "width": 1024, "height": 1024}'

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Qwen 3.5 large language model"}'

Monitor

curl -s http://localhost:11435/fleet/status | python3 -m json.tool
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Dashboard at http://localhost:11435/dashboard.

Full documentation

Contribute

Ollama Herd is open source (MIT):

  • Star on GitHub — help others run Qwen locally
  • Open an issue — share your Qwen setup, report bugs
  • PRs welcomeCLAUDE.md gives AI agents full context. 444 tests, async Python.

Guardrails

  • Model downloads require explicit user confirmation — Qwen models range from 4GB (7B) to 42GB (72B).
  • Model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Qwen Qwen3

Qwen Qwen3 — run Qwen3.5, Qwen3, Qwen3-Coder, Qwen2.5-Coder, and Qwen3-ASR across your local fleet. LLM inference, code generation, and speech-to-text from A...

Registry SourceRecently Updated
1261Profile unavailable
General

Ollama Herd

Ollama multimodal model router for Llama, Qwen, DeepSeek, Phi, and Mistral — plus mflux image generation, speech-to-text, and embeddings. Self-hosted Ollama...

Registry SourceRecently Updated
2350Profile unavailable
Coding

Deepseek Deepseek Coder

DeepSeek DeepSeek-Coder — run DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder across your local fleet. 7-signal scoring routes every request to the best device. Cro...

Registry SourceRecently Updated
1332Profile unavailable
Coding

Phi Phi4

Phi 4 by Microsoft — small but powerful LLMs that run on minimal hardware. Phi-4 (14B), Phi-4-mini (3.8B), and Phi-3.5 across your device fleet. Perfect for...

Registry SourceRecently Updated
1222Profile unavailable