workers-ai

Run AI inference at the edge with OpenAI SDK and Workers AI. Load when generating text with LLMs, extracting structured JSON from text, building chat interfaces, streaming AI responses, generating embeddings, or integrating GPT-4/Claude via AI Gateway.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "workers-ai" with this command: npx skills add null-shot/cloudflare-skills/null-shot-cloudflare-skills-workers-ai

Workers AI

Run AI inference at the edge using Workers AI and industry-standard SDKs like OpenAI. Deploy LLM-powered applications with structured outputs, streaming responses, and AI Gateway integration.

FIRST: Installation

npm install openai

Optional dependencies for advanced use cases:

npm install ai @ai-sdk/openai  # For streaming with Vercel AI SDK

When to Use

Use CaseDescription
Text GenerationGenerate content, summaries, translations
Structured ExtractionExtract structured data from unstructured text
Chat InterfacesBuild conversational AI applications
Content ModerationAnalyze and filter user-generated content
EmbeddingsGenerate vector embeddings for semantic search
RAG PipelinesCombine with Vectorize for retrieval-augmented generation

Quick Reference

TaskAPI
Structured JSON outputresponse_format: { type: 'json_schema', schema }
JSON mode (parse yourself)response_format: { type: 'json_object' }
Stream responsesUse Vercel AI SDK's streamText()
Enable AI GatewaySet baseUrl in OpenAI client config
Generate embeddingsclient.embeddings.create({ model, input })

Structured JSON Outputs

Workers AI supports structured JSON outputs using the OpenAI SDK's response_format API. This ensures the model returns data matching your schema.

import { OpenAI } from "openai";

interface Env {
  OPENAI_API_KEY: string;
}

// Define your JSON schema
const CalendarEventSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    date: { type: 'string' },
    participants: { type: 'array', items: { type: 'string' } },
  },
  required: ['name', 'date', 'participants']
};

export default {
  async fetch(request: Request, env: Env) {
    const client = new OpenAI({
      apiKey: env.OPENAI_API_KEY,
    });

    const response = await client.chat.completions.create({
      model: 'gpt-4o-2024-08-06',
      messages: [
        { role: 'system', content: 'Extract the event information.' },
        { role: 'user', content: 'Alice and Bob are going to a science fair on Friday.' },
      ],
      // Request structured JSON output with schema validation
      response_format: {
        type: 'json_schema',
        schema: CalendarEventSchema,
      },
    });

    // Parsed according to your schema
    const event = response.choices[0].message.parsed;

    return Response.json({
      calendar_event: event,
    });
  }
}

wrangler.jsonc:

{
  "name": "my-ai-app",
  "main": "src/index.ts",
  "compatibility_date": "2025-01-17",
  "observability": {
    "enabled": true
  }
}

Streaming Responses

For real-time chat experiences, use streaming to send tokens as they're generated.

import { OpenAI } from "openai";

interface Env {
  OPENAI_API_KEY: string;
}

export default {
  async fetch(request: Request, env: Env) {
    const client = new OpenAI({
      apiKey: env.OPENAI_API_KEY,
    });

    const stream = await client.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'user', content: 'Tell me a story about the edge.' }
      ],
      stream: true,
    });

    // Create a ReadableStream for SSE
    const encoder = new TextEncoder();
    const readable = new ReadableStream({
      async start(controller) {
        try {
          for await (const chunk of stream) {
            const content = chunk.choices[0]?.delta?.content;
            if (content) {
              controller.enqueue(encoder.encode(`data: ${JSON.stringify({ content })}\n\n`));
            }
          }
          controller.enqueue(encoder.encode('data: [DONE]\n\n'));
          controller.close();
        } catch (error) {
          controller.error(error);
        }
      },
    });

    return new Response(readable, {
      headers: {
        'Content-Type': 'text/event-stream',
        'Cache-Control': 'no-cache',
        'Connection': 'keep-alive',
      },
    });
  }
}

AI Gateway Integration

AI Gateway provides caching, rate limiting, analytics, and request logging for your AI requests. Configure it by setting the baseUrl in your OpenAI client.

import { OpenAI } from "openai";

interface Env {
  OPENAI_API_KEY: string;
  AI_GATEWAY_ACCOUNT_ID: string;
  AI_GATEWAY_ID: string;
}

export default {
  async fetch(request: Request, env: Env) {
    const client = new OpenAI({
      apiKey: env.OPENAI_API_KEY,
      // Route requests through AI Gateway
      baseUrl: `https://gateway.ai.cloudflare.com/v1/${env.AI_GATEWAY_ACCOUNT_ID}/${env.AI_GATEWAY_ID}/openai`
    });

    const response = await client.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { role: 'user', content: 'Hello, world!' }
      ],
    });

    return Response.json(response.choices[0].message);
  }
}

Benefits of AI Gateway:

  • Caching: Reduce costs by caching identical requests
  • Rate Limiting: Protect against abuse and control costs
  • Analytics: Monitor token usage, latency, and error rates
  • Logging: Inspect requests and responses for debugging
  • Multi-provider: Works with OpenAI, Anthropic, Azure, and more

Model Selection

Choose models based on your use case:

Model FamilyBest ForStructured Output Support
GPT-4oComplex reasoning, structured extractionYes
GPT-4o-miniFast, cost-effective tasksYes
GPT-3.5-turboSimple completions, high throughputLimited
Claude 3.5 SonnetLong-form content, analysisVia Anthropic SDK
Claude 3 HaikuFast responses, simple tasksVia Anthropic SDK

Choosing the right model:

  • Structured extraction: Use GPT-4o with json_schema
  • Chat interfaces: Use GPT-4o or Claude 3.5 Sonnet with streaming
  • High volume/low latency: Use GPT-4o-mini or Claude 3 Haiku
  • Complex reasoning: Use GPT-4o or Claude 3.5 Sonnet

Response Formats

Workers AI supports multiple response format options:

// Option 1: JSON Schema (recommended for structured extraction)
response_format: {
  type: 'json_schema',
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      age: { type: 'number' },
    },
    required: ['name']
  }
}

// Option 2: JSON Object (parse manually)
response_format: {
  type: 'json_object'
}
// Remember to prompt the model to return JSON

// Option 3: Text (default)
// No response_format specified - returns plain text

Generating Embeddings

Use embeddings for semantic search, RAG, and similarity matching. Combine with Vectorize for storage.

import { OpenAI } from "openai";

interface Env {
  OPENAI_API_KEY: string;
  VECTORIZE: VectorizeIndex;
}

export default {
  async fetch(request: Request, env: Env) {
    const client = new OpenAI({
      apiKey: env.OPENAI_API_KEY,
    });

    const text = "Cloudflare Workers run at the edge";

    // Generate embedding
    const response = await client.embeddings.create({
      model: 'text-embedding-3-small',
      input: text,
    });

    const vector = response.data[0].embedding;

    // Store in Vectorize
    await env.VECTORIZE.upsert([
      {
        id: '1',
        values: vector,
        metadata: { text }
      }
    ]);

    return Response.json({ 
      dimensions: vector.length,
      stored: true 
    });
  }
}

wrangler.jsonc with Vectorize binding:

{
  "vectorize": [
    {
      "binding": "VECTORIZE",
      "index_name": "my-embeddings-index"
    }
  ]
}

Error Handling

Always handle AI API errors gracefully:

export default {
  async fetch(request: Request, env: Env) {
    const client = new OpenAI({
      apiKey: env.OPENAI_API_KEY,
    });

    try {
      const response = await client.chat.completions.create({
        model: 'gpt-4o',
        messages: [{ role: 'user', content: 'Hello!' }],
      });

      return Response.json(response.choices[0].message);
    } catch (error) {
      // Handle rate limits
      if (error.status === 429) {
        return Response.json(
          { error: 'Rate limit exceeded. Please try again later.' },
          { status: 429 }
        );
      }

      // Handle invalid requests
      if (error.status === 400) {
        return Response.json(
          { error: 'Invalid request. Check your parameters.' },
          { status: 400 }
        );
      }

      // Generic error
      console.error('AI request failed:', error);
      return Response.json(
        { error: 'Internal server error' },
        { status: 500 }
      );
    }
  }
}

Detailed References

Best Practices

  1. Use structured outputs: Set response_format with json_schema for reliable data extraction
  2. Enable observability: Set observability.enabled: true in wrangler.jsonc
  3. Stream for chat: Use streaming responses for better user experience
  4. Cache with AI Gateway: Route requests through AI Gateway to cache and monitor
  5. Handle errors: Always catch and handle API errors gracefully
  6. Choose right model: Balance cost, speed, and capability based on your use case
  7. Validate inputs: Sanitize user inputs before sending to AI models
  8. Set timeouts: Use appropriate timeouts for long-running requests
  9. Use embeddings wisely: Batch embedding generation when possible
  10. Monitor token usage: Track costs through AI Gateway analytics

Integration Patterns

Pattern 1: Chat with Message History

interface Env {
  OPENAI_API_KEY: string;
  KV: KVNamespace;
}

export default {
  async fetch(request: Request, env: Env) {
    const { userId, message } = await request.json();
    const client = new OpenAI({ apiKey: env.OPENAI_API_KEY });

    // Get message history from KV
    const historyJson = await env.KV.get(`chat:${userId}`);
    const history = historyJson ? JSON.parse(historyJson) : [];

    // Add user message
    history.push({ role: 'user', content: message });

    // Get AI response
    const response = await client.chat.completions.create({
      model: 'gpt-4o',
      messages: history,
    });

    const assistantMessage = response.choices[0].message;
    history.push(assistantMessage);

    // Store updated history
    await env.KV.put(`chat:${userId}`, JSON.stringify(history), {
      expirationTtl: 3600 // 1 hour
    });

    return Response.json({ message: assistantMessage.content });
  }
}

Pattern 2: RAG with Vectorize

interface Env {
  OPENAI_API_KEY: string;
  VECTORIZE: VectorizeIndex;
}

export default {
  async fetch(request: Request, env: Env) {
    const { query } = await request.json();
    const client = new OpenAI({ apiKey: env.OPENAI_API_KEY });

    // Generate query embedding
    const embeddingResponse = await client.embeddings.create({
      model: 'text-embedding-3-small',
      input: query,
    });

    // Search similar documents
    const results = await env.VECTORIZE.query(embeddingResponse.data[0].embedding, {
      topK: 3,
    });

    // Build context from results
    const context = results.matches
      .map(match => match.metadata.text)
      .join('\n\n');

    // Generate answer with context
    const response = await client.chat.completions.create({
      model: 'gpt-4o',
      messages: [
        { 
          role: 'system', 
          content: `Answer questions using this context:\n\n${context}` 
        },
        { role: 'user', content: query }
      ],
    });

    return Response.json({
      answer: response.choices[0].message.content,
      sources: results.matches.map(m => m.metadata),
    });
  }
}

Common Pitfalls

  1. Not handling rate limits: Always catch 429 errors and implement backoff
  2. Ignoring token limits: Monitor and truncate input to stay within model limits
  3. Not caching: Use AI Gateway or KV to cache responses for identical requests
  4. Blocking on responses: Use streaming for better perceived performance
  5. Missing error boundaries: Wrap AI calls in try-catch blocks
  6. Hardcoding API keys: Always use environment bindings
  7. Not validating schemas: Test your JSON schemas thoroughly
  8. Overfitting prompts: Keep system prompts concise and clear

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

cloudflare-opennext

No summary provided by upstream source.

Repository SourceNeeds Review
General

r2-storage

No summary provided by upstream source.

Repository SourceNeeds Review
General

queues

No summary provided by upstream source.

Repository SourceNeeds Review