venice-ai-api

Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "venice-ai-api" with this command: npx skills add jrajasekera/claude-skills/jrajasekera-claude-skills-venice-ai-api

Venice.ai API Skill

Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.

Quick Reference

Base URL: https://api.venice.ai/api/v1 Auth: Authorization: Bearer VENICE_API_KEY SDK: Use OpenAI SDK with custom base URL API Key Types: ADMIN (full access) or INFERENCE (inference only)

Setup

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("VENICE_API_KEY"),
    base_url="https://api.venice.ai/api/v1"
)
import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: process.env.VENICE_API_KEY,
    baseURL: 'https://api.venice.ai/api/v1'
});

Account Tiers

TierQualificationRate LimitsUse Case
ExplorerPro subscriptionLow RPM/TPM (~15-25 req/day)Testing, prototyping
PaidUSD balance or staked VVV (Diems)Standard production limitsCommercial apps
PartnerEnterprise agreementCustom high-volumeEnterprise SaaS

API Capabilities

1. Chat Completions

Text inference with multimodal support (text, images, audio, video).

completion = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello!"}
    ]
)

Popular Models:

  • llama-3.3-70b - Balanced performance (Tier M, 128K context)
  • zai-org-glm-4.7 - Complex tasks, deep reasoning (Tier L, 128K context)
  • mistral-31-24b - Vision + function calling (Tier S, 131K context)
  • venice-uncensored - No content filtering (Tier S, 32K context)
  • deepseek-ai-DeepSeek-R1 - Advanced reasoning, math, coding (Tier L, 64K context)
  • qwen3-235b - Massive MoE reasoning (Tier L)
  • qwen3-4b - Fast, lightweight (Tier XS, 40K context)

Venice Parameters (via extra_body in Python, direct in JS):

  • enable_web_search: "off" | "on" | "auto"
  • enable_web_scraping: boolean
  • enable_web_citations: boolean — adds ^index^ citation format
  • include_venice_system_prompt: boolean (default: true)
  • strip_thinking_response: boolean
  • disable_thinking: boolean
  • character_slug: string
  • prompt_cache_key: string — routing hint for cache hits
  • prompt_cache_retention: "default" | "extended" | "24h"

See references/chat-completions.md for full parameter reference.

2. Image Generation

Generate images from text prompts.

import requests

response = requests.post(
    "https://api.venice.ai/api/v1/image/generate",
    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},
    json={
        "model": "venice-sd35",
        "prompt": "A sunset over mountains",
        "width": 1024,
        "height": 1024
    }
)
# Response contains base64 images in images array

Image Models:

ModelBest ForPricing
qwen-imageHighest quality, editingVariable
venice-sd35General purpose (default)~$0.01/image
hidreamFast generation~$0.01/image
flux-2-proProfessional quality~$0.04/image
flux-2-maxHigh-quality output~$0.02/image
nano-banana-proPhotorealism, 2K/4K support$0.18-$0.35

3. Image Upscaling

Enhance image resolution 2x or 4x.

import base64

with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.venice.ai/api/v1/image/upscale",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "image": image_base64,
        "scale": 4  # 2 or 4
    }
)
# Returns raw image binary
with open("upscaled.png", "wb") as f:
    f.write(response.content)

Pricing: $0.02 (2x), $0.08 (4x)

4. Image Editing (Inpainting)

Modify existing images with AI-powered instructions.

import base64

with open("photo.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.venice.ai/api/v1/image/edit",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "prompt": "Change the sky to a sunset",
        "image": image_base64  # or URL starting with http/https
    }
)
# Returns raw image binary
with open("edited.png", "wb") as f:
    f.write(response.content)

Model: Uses Qwen-Image. Pricing: ~$0.04/edit.

See references/image-api.md for all parameters and style presets.

5. Video Generation

Async queue-based video generation. Always call /video/quote first for pricing.

Full Workflow:

import requests
import time
import base64

api_key = os.getenv("VENICE_API_KEY")
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Step 1: Get price quote
quote = requests.post(
    "https://api.venice.ai/api/v1/video/quote",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "duration": "10s",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "audio": True
    }
)
print(f"Estimated cost: ${quote.json()['quote']}")

# Step 2: Queue the job (text-to-video)
queue_resp = requests.post(
    "https://api.venice.ai/api/v1/video/queue",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "prompt": "A serene forest with sunlight filtering through trees",
        "negative_prompt": "low quality, blurry",
        "duration": "10s",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "audio": True
    }
)
queue_id = queue_resp.json()["queueid"]

# Step 3: Poll until complete
while True:
    status_resp = requests.post(
        "https://api.venice.ai/api/v1/video/retrieve",
        headers=headers,
        json={
            "model": "kling-2.5-turbo-pro-text-to-video",
            "queueid": queue_id,
            "delete_media_on_completion": False
        }
    )
    if (status_resp.status_code == 200
            and status_resp.headers.get("Content-Type") == "video/mp4"):
        with open("output.mp4", "wb") as f:
            f.write(status_resp.content)
        print("Video saved!")
        break
    else:
        status = status_resp.json()
        print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")
        time.sleep(10)

# Step 4: Cleanup (optional — deletes from Venice storage)
requests.post(
    "https://api.venice.ai/api/v1/video/complete",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "queueid": queue_id
    }
)

Image-to-Video:

with open("image.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode("utf-8")

queue_resp = requests.post(
    "https://api.venice.ai/api/v1/video/queue",
    headers=headers,
    json={
        "model": "wan-2.5-preview-image-to-video",
        "prompt": "Animate this scene with gentle motion",
        "image_url": f"data:image/png;base64,{img_b64}",
        "duration": "5s",
        "resolution": "720p"
    }
)

Video Models:

ModelTypeFeatures
kling-2.5-turbo-proText/Image-to-VideoFast, high quality
wan-2.5-previewImage-to-VideoAnimation specialist
ltx-2-fullText/Image-to-VideoFull quality
veo3-fastText/Image-to-VideoSpeed-optimized
sora-2Image-to-VideoHigh-end quality

See references/video-api.md for full parameter reference.

6. Text-to-Speech

Convert text to audio with 60+ voices.

response = requests.post(
    "https://api.venice.ai/api/v1/audio/speech",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "input": "Hello, welcome to Venice.",
        "model": "tts-kokoro",
        "voice": "af_sky",
        "speed": 1.0,            # 0.25 to 4.0
        "response_format": "mp3"  # mp3, opus, aac, flac, wav, pcm
    }
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)

Voices: af_sky, af_nova, am_liam, bf_emma, zf_xiaobei, jm_kumo, and 50+ more. Pricing: $3.50 per 1M characters.

7. Speech-to-Text

Transcribe audio files.

with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://api.venice.ai/api/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"file": f},
        data={
            "model": "nvidia/parakeet-tdt-0.6b-v3",
            "response_format": "json",  # json or text
            "timestamps": "true"
        }
    )

Formats: WAV, FLAC, MP3, M4A, AAC, MP4. Pricing: $0.0001 per audio second.

8. Embeddings

Generate vector embeddings for RAG and semantic search.

response = requests.post(
    "https://api.venice.ai/api/v1/embeddings",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "text-embedding-bge-m3",
        "input": "Privacy-first AI infrastructure",
        "encoding_format": "float"  # or "base64"
    }
)

9. Vision (Multimodal)

Analyze images with vision-capable models.

response = client.chat.completions.create(
    model="mistral-31-24b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {"url": "https://..."}}
        ]
    }]
)

10. Function Calling

Define tools for the model to call.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Weather in SF?"}],
    tools=tools
)

11. Structured Outputs

Get guaranteed JSON schema responses.

response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "my_response",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {"answer": {"type": "string"}},
                "required": ["answer"],
                "additionalProperties": False
            }
        }
    }
)

Requirements: strict: true, additionalProperties: false, all fields in required.

12. AI Characters

Interact with predefined AI personas.

# List characters
characters = requests.get(
    "https://api.venice.ai/api/v1/characters",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"categories": "philosophy", "limit": 50}
).json()

# Chat with a character
response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    extra_body={
        "venice_parameters": {"character_slug": "alan-watts"}
    }
)

13. Model Discovery

Query available models and capabilities programmatically.

# List models by type
models = requests.get(
    "https://api.venice.ai/api/v1/models",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"type": "text"}  # text, image, audio, video, embedding
).json()

# Get model traits for auto-selection
traits = requests.get(
    "https://api.venice.ai/api/v1/models/traits",
    params={"type": "text"}
).json()
# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}

# Use trait as model ID for automatic routing
response = client.chat.completions.create(
    model="fastest",  # Venice routes to the current fastest model
    messages=[...]
)

Error Handling

Error Codes

StatusError CodeMeaningAction
400INVALID_REQUESTBad parametersCheck payload schema
401AUTHENTICATION_FAILEDInvalid API keyVerify key and balance
402Insufficient balanceAdd USD or stake VVV
403Unauthorized accessCheck key type (ADMIN vs INFERENCE)
413Payload too largeReduce request size
415Invalid content typeUse application/json
422Content policy violationModify prompt
429RATE_LIMIT_EXCEEDEDToo many requestsBackoff, wait for reset
500INFERENCE_FAILEDModel errorRetry with backoff
503Model at capacityRetry later or switch model
504TimeoutUse streaming for long responses

Abuse Protection

Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.

Retry with Exponential Backoff (Python)

import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_venice_session():
    """Create a requests session with automatic retry and backoff."""
    session = requests.Session()
    retry = Retry(
        total=3,
        backoff_factor=1,  # 1s, 2s, 4s
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["POST", "GET"]
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("https://", adapter)
    return session

session = create_venice_session()
response = session.post(url, json=payload, headers=headers)

Retry with Exponential Backoff (JavaScript)

async function veniceRequest(url, options, maxRetries = 3) {
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
        const response = await fetch(url, options);

        if (response.ok) return response;

        if ([429, 500, 502, 503, 504].includes(response.status)) {
            if (attempt < maxRetries) {
                const delay = Math.pow(2, attempt) * 1000;
                console.log(`Retry ${attempt + 1} in ${delay}ms (status ${response.status})`);
                await new Promise(r => setTimeout(r, delay));
                continue;
            }
        }

        throw new Error(`Venice API error: ${response.status} ${response.statusText}`);
    }
}

Rate Limit-Aware Client (Python)

import time
import requests

class VeniceClient:
    """Wrapper that respects rate limits using response headers."""
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.venice.ai/api/v1"
        self.session = create_venice_session()
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def request(self, method, path, **kwargs):
        resp = self.session.request(
            method, f"{self.base_url}{path}",
            headers=self.headers, **kwargs
        )
        remaining = resp.headers.get("x-ratelimit-remaining-requests")
        if remaining and int(remaining) <= 1:
            reset = resp.headers.get("x-ratelimit-reset-requests")
            if reset:
                wait = max(0, float(reset) - time.time())
                time.sleep(wait)
        resp.raise_for_status()
        return resp

Response Headers

Monitor these headers for production:

  • x-ratelimit-remaining-requests — Requests left in window
  • x-ratelimit-remaining-tokens — Tokens left in window
  • x-ratelimit-reset-requests — Timestamp when request count resets
  • x-venice-balance-usd — USD balance
  • x-venice-balance-diem — DIEM balance
  • x-venice-is-blurred — Image was blurred (safe mode)
  • x-venice-is-content-violation — Content policy violation
  • x-venice-model-deprecation-warning — Deprecation notice
  • x-venice-model-deprecation-date — Sunset date
  • CF-RAY — Request ID for support

Rate Limits by Model Tier

Text Models:

TierRPMTPMExample Models
XS5001,000,000qwen3-4b, llama-3.2-3b
S75750,000mistral-31-24b, venice-uncensored
M50750,000llama-3.3-70b, qwen3-next-80b
L20500,000zai-org-glm-4.7, deepseek-ai-DeepSeek-R1

Other Endpoints:

EndpointRPM
Image Generation20
Audio Synthesis60
Audio Transcription60
Embeddings500
Video Queue40
Video Retrieve120

API Key Management

# Create key programmatically (requires ADMIN key)
curl -X POST https://api.venice.ai/api/v1/api_keys \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'

# Check rate limits and balance
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"

# List keys
curl https://api.venice.ai/api/v1/api_keys \
  -H "Authorization: Bearer $VENICE_API_KEY"

# Delete key
curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \
  -H "Authorization: Bearer $VENICE_API_KEY"

Reference Files

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

pandoc-converter

No summary provided by upstream source.

Repository SourceNeeds Review
General

openrouter-api

No summary provided by upstream source.

Repository SourceNeeds Review
General

sqlite-optimization

No summary provided by upstream source.

Repository SourceNeeds Review
General

z-ai-api

No summary provided by upstream source.

Repository SourceNeeds Review
venice-ai-api | V50.AI