OpenAI Responses API
Status: Production Ready Last Updated: 2026-01-21 API Launch: March 2025 Dependencies: openai@6.16.0 (Node.js) or fetch API (Cloudflare Workers)
What Is the Responses API?
OpenAI's unified interface for agentic applications, launched March 2025. Provides stateful conversations with preserved reasoning state across turns.
Key Innovation: Unlike Chat Completions (reasoning discarded between turns), Responses preserves the model's reasoning notebook, improving performance by 5% on TAUBench and enabling better multi-turn interactions.
vs Chat Completions:
Feature Chat Completions Responses API
State Manual history tracking Automatic (conversation IDs)
Reasoning Dropped between turns Preserved across turns (+5% TAUBench)
Tools Client-side round trips Server-side hosted
Output Single message Polymorphic (8 types)
Cache Baseline 40-80% better utilization
MCP Manual Built-in
Quick Start
npm install openai@6.16.0
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await openai.responses.create({ model: 'gpt-5', input: 'What are the 5 Ds of dodgeball?', });
console.log(response.output_text);
Key differences from Chat Completions:
-
Endpoint: /v1/responses (not /v1/chat/completions )
-
Parameter: input (not messages )
-
Role: developer (not system )
-
Output: response.output_text (not choices[0].message.content )
When to Use Responses vs Chat Completions
Use Responses:
-
Agentic applications (reasoning + actions)
-
Multi-turn conversations (preserved reasoning = +5% TAUBench)
-
Built-in tools (Code Interpreter, File Search, Web Search, MCP)
-
Background processing (60s standard, 10min extended timeout)
Use Chat Completions:
-
Simple one-off generation
-
Fully stateless interactions
-
Legacy integrations
Stateful Conversations
Automatic State Management using conversation IDs:
// Create conversation const conv = await openai.conversations.create({ metadata: { user_id: 'user_123' }, });
// First turn const response1 = await openai.responses.create({ model: 'gpt-5', conversation: conv.id, input: 'What are the 5 Ds of dodgeball?', });
// Second turn - model remembers context + reasoning const response2 = await openai.responses.create({ model: 'gpt-5', conversation: conv.id, input: 'Tell me more about the first one', });
Benefits: No manual history tracking, reasoning preserved, 40-80% better cache utilization
Conversation Limits: 90-day expiration
Built-in Tools (Server-Side)
Server-side hosted tools eliminate backend round trips:
Tool Purpose Notes
code_interpreter
Execute Python code Sandboxed, 30s timeout (use background: true for longer)
file_search
RAG without vector stores Max 512MB per file, supports PDF/Word/Markdown/HTML/code
web_search
Real-time web information Automatic source citations
image_generation
DALL-E integration DALL-E 3 default
mcp
Connect external tools OAuth supported, tokens NOT stored
Usage:
const response = await openai.responses.create({ model: 'gpt-5', input: 'Calculate mean of: 10, 20, 30, 40, 50', tools: [{ type: 'code_interpreter' }], });
Web Search TypeScript Note
TypeScript Limitation: The web_search tool's external_web_access option is missing from SDK types (as of v6.16.0).
Workaround:
const response = await openai.responses.create({ model: 'gpt-5', input: 'Search for recent news', tools: [{ type: 'web_search', external_web_access: true, } as any], // ✅ Type assertion to suppress error });
Source: GitHub Issue #1716
MCP Server Integration
Built-in support for Model Context Protocol (MCP) servers to connect external tools (Stripe, databases, custom APIs).
User Approval Requirement
By default, explicit user approval is required before any data is shared with a remote MCP server (security feature).
Handling Approval:
const response = await openai.responses.create({ model: 'gpt-5', input: 'Get my Stripe balance', tools: [{ type: 'mcp', server_label: 'stripe', server_url: 'https://mcp.stripe.com', authorization: process.env.STRIPE_TOKEN, }], });
if (response.status === 'requires_approval') { // Show user: "This action requires sharing data with Stripe. Approve?" // After user approves, retry with approval token }
Alternative: Pre-approve MCP servers in OpenAI dashboard (users configure trusted servers via settings)
Source: Official MCP Guide
Basic MCP Usage
const response = await openai.responses.create({ model: 'gpt-5', input: 'Roll 2d6 dice', tools: [{ type: 'mcp', server_label: 'dice', server_url: 'https://example.com/mcp', authorization: process.env.TOKEN, // ⚠️ NOT stored, required each request }], });
MCP Output Types:
-
mcp_list_tools
-
Tools discovered on server
-
mcp_call
-
Tool invocation + result
-
message
-
Final response
Reasoning Preservation
Key Innovation: Model's internal reasoning state survives across turns (unlike Chat Completions which discards it).
Visual Analogy:
-
Chat Completions: Model tears out scratchpad page before responding
-
Responses API: Scratchpad stays open for next turn
Performance: +5% on TAUBench (GPT-5) purely from preserved reasoning
Reasoning Summaries (free):
response.output.forEach(item => { if (item.type === 'reasoning') console.log(item.summary[0].text); if (item.type === 'message') console.log(item.content[0].text); });
Important: Reasoning Traces Privacy
What You Get: Reasoning summaries (not full internal traces) What OpenAI Keeps: Full chain-of-thought reasoning (proprietary, for security/privacy)
For GPT-5-Thinking models:
-
OpenAI preserves reasoning internally in their backend
-
This preserved reasoning improves multi-turn performance (+5% TAUBench)
-
But developers only receive summaries, not the actual chain-of-thought
-
Full reasoning traces are not exposed (OpenAI's IP protection)
Source: Sean Goedecke Analysis
Background Mode
For long-running tasks, use background: true :
const response = await openai.responses.create({ model: 'gpt-5', input: 'Analyze 500-page document', background: true, tools: [{ type: 'file_search', file_ids: [fileId] }], });
// Poll for completion (check every 5s) const result = await openai.responses.retrieve(response.id); if (result.status === 'completed') console.log(result.output_text);
Timeout Limits:
-
Standard: 60 seconds
-
Background: 10 minutes
Performance Considerations
Time-to-First-Token (TTFT) Latency: Background mode currently has higher TTFT compared to synchronous responses. OpenAI is working to reduce this gap.
Recommendation:
-
For user-facing real-time responses, use sync mode (lower latency)
-
For long-running async tasks, use background mode (latency acceptable)
Source: OpenAI Background Mode Docs
Data Retention and Privacy
Default Retention: 30 days when store: true (default) Zero Data Retention (ZDR): Organizations with ZDR automatically enforce store: false
Background Mode: NOT ZDR compatible (stores data ~10 minutes for polling)
Timeline:
-
September 26, 2025: OpenAI court-ordered retention ended
-
Current: 30-day default retention with store: true
Control Storage:
// Disable storage (no retention) const response = await openai.responses.create({ model: 'gpt-5', input: 'Hello!', store: false, // ✅ No retention });
// ZDR organizations: store always treated as false const response = await openai.responses.create({ model: 'gpt-5', input: 'Hello!', store: true, // ⚠️ Ignored by OpenAI for ZDR orgs, treated as false });
ZDR Compliance:
-
Avoid background mode (requires temporary storage)
-
Explicitly set store: false for clarity
-
Note: 60s timeout applies in sync mode
Source: OpenAI Data Controls
Polymorphic Outputs
Returns 8 output types instead of single message:
Type Example
message
Final answer, explanation
reasoning
Step-by-step thought process (free!)
code_interpreter_call
Python code + results
mcp_call
Tool name, args, output
mcp_list_tools
Tool definitions from MCP server
file_search_call
Matched chunks, citations
web_search_call
URLs, snippets
image_generation_call
Image URL
Processing:
response.output.forEach(item => { if (item.type === 'reasoning') console.log(item.summary[0].text); if (item.type === 'web_search_call') console.log(item.results); if (item.type === 'message') console.log(item.content[0].text); });
// Or use helper for text-only console.log(response.output_text);
Migration from Chat Completions
Breaking Changes:
Feature Chat Completions Responses API
Endpoint /v1/chat/completions
/v1/responses
Parameter messages
input
Role system
developer
Output choices[0].message.content
output_text
State Manual array Automatic (conversation ID)
Streaming data: {"choices":[...]}
SSE with 8 item types
Example:
// Before const response = await openai.chat.completions.create({ model: 'gpt-5', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' }, ], }); console.log(response.choices[0].message.content);
// After const response = await openai.responses.create({ model: 'gpt-5', input: [ { role: 'developer', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Hello!' }, ], }); console.log(response.output_text);
Migration from Assistants API
CRITICAL: Assistants API Sunset Timeline
-
August 26, 2025: Assistants API officially deprecated
-
2025-2026: OpenAI providing migration utilities
-
August 26, 2026: Assistants API sunset (stops working)
Migrate before August 26, 2026 to avoid breaking changes.
Source: Assistants API Sunset Announcement
Key Breaking Changes:
Assistants API Responses API
Assistants (created via API) Prompts (created in dashboard)
Threads Conversations (store items, not just messages)
Runs (server-side lifecycle) Responses (stateless calls)
Run-Steps Items (polymorphic outputs)
Migration Example:
// Before (Assistants API - deprecated) const assistant = await openai.beta.assistants.create({ model: 'gpt-4', instructions: 'You are helpful.', });
const thread = await openai.beta.threads.create();
const run = await openai.beta.threads.runs.create(thread.id, { assistant_id: assistant.id, });
// After (Responses API - current) const conversation = await openai.conversations.create({ metadata: { purpose: 'customer_support' }, });
const response = await openai.responses.create({ model: 'gpt-5', conversation: conversation.id, input: [ { role: 'developer', content: 'You are helpful.' }, { role: 'user', content: 'Hello!' }, ], });
Migration Guide: Official Assistants Migration Docs
Known Issues Prevention
This skill prevents 11 documented errors:
- Session State Not Persisting
-
Cause: Not using conversation IDs or using different IDs per turn
-
Fix: Create conversation once (const conv = await openai.conversations.create() ), reuse conv.id for all turns
- MCP Server Connection Failed (mcp_connection_error )
-
Causes: Invalid URL, missing/expired auth token, server down
-
Fix: Verify URL is correct, test manually with fetch() , check token expiration
- Code Interpreter Timeout (code_interpreter_timeout )
-
Cause: Code runs longer than 30 seconds
-
Fix: Use background: true for extended timeout (up to 10 min)
- Image Generation Rate Limit (rate_limit_error )
-
Cause: Too many DALL-E requests
-
Fix: Implement exponential backoff retry (1s, 2s, 3s delays)
- File Search Relevance Issues
-
Cause: Vague queries return irrelevant results
-
Fix: Use specific queries ("pricing in Q4 2024" not "find pricing"), filter by chunk.score > 0.7
- Cost Tracking Confusion
-
Cause: Responses bills for input + output + tools + stored conversations (vs Chat Completions: input + output only)
-
Fix: Set store: false if not needed, monitor response.usage.tool_tokens
- Conversation Not Found (invalid_request_error )
-
Causes: ID typo, conversation deleted, or expired (90-day limit)
-
Fix: Verify exists with openai.conversations.list() before using
- Tool Output Parsing Failed
-
Cause: Accessing wrong output structure
-
Fix: Use response.output_text helper or iterate response.output.forEach(item => ...) checking item.type
- Zod v4 Incompatibility with Structured Outputs
-
Error: Invalid schema for response_format 'name': schema must be a JSON Schema of 'type: "object"', got 'type: "string"'.
-
Source: GitHub Issue #1597
-
Why It Happens: SDK's vendored zod-to-json-schema library doesn't support Zod v4 (missing ZodFirstPartyTypeKind export)
-
Prevention: Pin to Zod v3 ("zod": "^3.23.8" ) or use custom zodTextFormat with z.toJSONSchema({ target: "draft-7" })
// Workaround: Pin to Zod v3 (recommended) { "dependencies": { "openai": "^6.16.0", "zod": "^3.23.8" // DO NOT upgrade to v4 yet } }
- Background Mode Web Search Missing Sources
-
Error: web_search_call output items contain query but no sources/results
-
Source: GitHub Issue #1676
-
Why It Happens: When using background: true
- web_search tool, OpenAI doesn't return sources in the response
- Prevention: Use synchronous mode (background: false ) when web search sources are needed
// ✅ Sources available in sync mode const response = await openai.responses.create({ model: 'gpt-5', input: 'Latest AI news?', background: false, // Required for sources tools: [{ type: 'web_search' }], });
- Streaming Mode Missing output_text Helper
-
Error: finalResponse().output_text is undefined in streaming mode
-
Source: GitHub Issue #1662
-
Why It Happens: stream.finalResponse() doesn't include output_text convenience field (only available in non-streaming responses)
-
Prevention: Listen for output_text.done event or manually extract from output items
// Workaround: Listen for event const stream = openai.responses.stream({ model: 'gpt-5', input: 'Hello!' }); let outputText = ''; for await (const event of stream) { if (event.type === 'output_text.done') { outputText = event.output_text; // ✅ Available in event } }
Critical Patterns
✅ Always:
-
Use conversation IDs for multi-turn (40-80% better cache)
-
Handle all 8 output types in polymorphic responses
-
Use background: true for tasks >30s
-
Provide MCP authorization tokens (NOT stored, required each request)
-
Monitor response.usage.total_tokens for cost control
❌ Never:
-
Expose API keys in client-side code
-
Assume single message output (use response.output_text helper)
-
Reuse conversation IDs across users (security risk)
-
Ignore error types (handle rate_limit_error , mcp_connection_error specifically)
-
Poll faster than 1s for background tasks (use 5s intervals)
References
Official Docs:
-
Responses API Guide: https://platform.openai.com/docs/guides/responses
-
API Reference: https://platform.openai.com/docs/api-reference/responses
-
MCP Integration: https://platform.openai.com/docs/guides/tools-connectors-mcp
-
Blog Post: https://developers.openai.com/blog/responses-api/
-
Starter App: https://github.com/openai/openai-responses-starter-app
Skill Resources: templates/ , references/responses-vs-chat-completions.md , references/mcp-integration-guide.md , references/built-in-tools-guide.md , references/migration-guide.md , references/top-errors.md
Last verified: 2026-01-21 | Skill version: 2.1.0 | Changes: Added 3 TIER 1 issues (Zod v4, background web search, streaming output_text), 2 TIER 2 findings (MCP approval, reasoning privacy), Data Retention & ZDR section, Assistants API sunset timeline, background mode TTFT note, web search TypeScript limitation. Updated SDK version to 6.16.0.