mistral-rate-limits

Rate limit management for Mistral AI API. Mistral enforces per-minute request and token limits that vary by model tier and subscription plan.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "mistral-rate-limits" with this command: npx skills add jeremylongshore/claude-code-plugins-plus-skills/jeremylongshore-claude-code-plugins-plus-skills-mistral-rate-limits

Mistral Rate Limits

Overview

Rate limit management for Mistral AI API. Mistral enforces per-minute request and token limits that vary by model tier and subscription plan.

Prerequisites

  • Mistral API key configured

  • Understanding of token vs request rate limits

  • Retry infrastructure

Mistral Rate Limits by Tier

Model Requests/min Tokens/min Tokens/month

mistral-small 120 500,000 Varies by plan

mistral-medium 60 500,000 Varies by plan

mistral-large 30 200,000 Varies by plan

mistral-embed 300 1,000,000 Varies by plan

Instructions

Step 1: Token-Aware Rate Limiter

Mistral limits both requests and tokens per minute. Track both.

import time, threading

class MistralRateLimiter: def init(self, rpm: int = 60, tpm: int = 200000): # 200000 = 200K limit self.rpm = rpm self.tpm = tpm self.request_times = [] self.token_usage = [] self.lock = threading.Lock()

def wait_if_needed(self, estimated_tokens: int = 1000):  # 1000: 1 second in ms
    with self.lock:
        now = time.time()
        cutoff = now - 60
        self.request_times = [t for t in self.request_times if t > cutoff]
        self.token_usage = [(t, n) for t, n in self.token_usage if t > cutoff]

        current_rpm = len(self.request_times)
        current_tpm = sum(n for _, n in self.token_usage)

        if current_rpm >= self.rpm:
            sleep_time = self.request_times[0] - cutoff
            time.sleep(sleep_time + 0.1)

        if current_tpm + estimated_tokens > self.tpm:
            sleep_time = self.token_usage[0][0] - cutoff
            time.sleep(sleep_time + 0.1)

        self.request_times.append(time.time())

def record_usage(self, tokens: int):
    with self.lock:
        self.token_usage.append((time.time(), tokens))

Usage

limiter = MistralRateLimiter(rpm=30, tpm=200000)

def rate_limited_chat(client, messages, model="mistral-large-latest"): estimated = sum(len(m["content"]) // 4 for m in messages) limiter.wait_if_needed(estimated) response = client.chat.complete(model=model, messages=messages) limiter.record_usage(response.usage.total_tokens) return response

Step 2: Handle 429 Responses

import time

def chat_with_retry(client, messages, model, max_retries=5): for attempt in range(max_retries): try: return client.chat.complete(model=model, messages=messages) except Exception as e: if hasattr(e, 'status_code') and e.status_code == 429: # HTTP 429 Too Many Requests wait = min(2 ** attempt + 1, 60) print(f"Rate limited, waiting {wait}s (attempt {attempt+1})") time.sleep(wait) else: raise raise Exception("Max retries exceeded")

Step 3: Model-Tier Routing for Throughput

Route requests to cheaper models when premium capacity is exhausted.

class ModelRouter: def init(self): self.limiters = { "mistral-large-latest": MistralRateLimiter(rpm=30, tpm=200000), # 200000 = 200K limit "mistral-small-latest": MistralRateLimiter(rpm=120, tpm=500000), # 500000 = configured value }

def get_available_model(self, preferred: str = "mistral-large-latest") -> str:
    limiter = self.limiters[preferred]
    if limiter.has_capacity():
        return preferred
    # Fall back to smaller model with more capacity
    return "mistral-small-latest"

Step 4: Batch Embedding with Rate Awareness

def batch_embed(client, texts: list[str], batch_size: int = 32): limiter = MistralRateLimiter(rpm=300, tpm=1000000) # 1000000: 300: timeout: 5 minutes all_embeddings = [] for i in range(0, len(texts), batch_size): batch = texts[i:i+batch_size] estimated_tokens = sum(len(t) // 4 for t in batch) limiter.wait_if_needed(estimated_tokens) response = client.embeddings.create(model="mistral-embed", inputs=batch) all_embeddings.extend([d.embedding for d in response.data]) limiter.record_usage(response.usage.total_tokens) return all_embeddings

Error Handling

Issue Cause Solution

429 errors Exceeded RPM or TPM Use rate limiter, exponential backoff

Inconsistent limits Different limits per model Configure limiter per model tier

Batch embedding failures Too many tokens per batch Reduce batch size

Spike traffic blocked No smoothing Queue requests, spread over time

Examples

Rate Limit Dashboard

status = { "rpm_used": len(limiter.request_times), "rpm_limit": limiter.rpm, "tpm_used": sum(n for _, n in limiter.token_usage), "tpm_limit": limiter.tpm, "utilization_pct": len(limiter.request_times) / limiter.rpm * 100 }

Resources

  • Mistral Rate Limits

  • Mistral API Reference

Output

  • Configuration files or code changes applied to the project

  • Validation report confirming correct implementation

  • Summary of changes made and their rationale

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

backtesting-trading-strategies

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

svg-icon-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

performance-lighthouse-runner

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

mindmap-generator

No summary provided by upstream source.

Repository SourceNeeds Review