Computer Use Skill
Step 1: Environment Setup
Before using Computer Use, ensure proper sandboxed environment:
Docker Container (Recommended):
Use Anthropic's reference container
docker run -it --rm
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
-p 5900:5900 -p 8501:8501
ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Virtual Machine: Dedicated VM with minimal privileges and isolated network
Never run on host machine with access to sensitive data or credentials
Step 2: Tool Configuration
Configure the computer use tool with display settings:
const computerTool = { type: 'computer_20250124', // or "computer_20251124" for Opus 4.5 name: 'computer', display_width_px: 1024, display_height_px: 768, display_number: 1, };
Resolution Guidelines:
-
XGA (1024x768): Default, works well for most tasks
-
WXGA (1280x800): Better for wide content
-
1920x1080: Only if needed, may reduce accuracy
Step 3: Agent Loop Implementation
The core pattern for computer use is an agent loop:
async function computerUseAgentLoop(task, maxIterations = 50) { const messages = [{ role: 'user', content: task }];
for (let i = 0; i < maxIterations; i++) { // 1. Call Claude with computer tool const response = await anthropic.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, tools: [computerTool], messages, betas: ['computer-use-2025-01-24'], });
// 2. Check if task complete
if (response.stop_reason === 'end_turn') {
return extractFinalResult(response);
}
// 3. Process tool use requests
const toolResults = [];
for (const block of response.content) {
if (block.type === 'tool_use' && block.name === 'computer') {
const result = await executeComputerAction(block.input);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: result,
});
}
}
// 4. Add assistant response and tool results
messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });
}
throw new Error('Max iterations reached'); }
Step 4: Action Execution
Execute computer actions based on Claude's requests:
async function executeComputerAction(input) { const { action, coordinate, text, scroll_direction, scroll_amount } = input;
switch (action) { case 'screenshot': return await captureScreenshot();
case 'left_click':
await click(coordinate[0], coordinate[1]);
return await captureScreenshot();
case 'type':
await typeText(text);
return await captureScreenshot();
case 'key':
await pressKey(text);
return await captureScreenshot();
case 'mouse_move':
await moveMouse(coordinate[0], coordinate[1]);
return await captureScreenshot();
case 'scroll':
await scroll(coordinate, scroll_direction, scroll_amount);
return await captureScreenshot();
case 'left_click_drag':
await drag(input.start_coordinate, coordinate);
return await captureScreenshot();
case 'wait':
await sleep(input.duration * 1000);
return await captureScreenshot();
default:
throw new Error(`Unknown action: ${action}`);
} }
Step 5: Coordinate Scaling
When display resolution differs from tool configuration:
function scaleCoordinates(x, y, fromWidth, fromHeight, toWidth, toHeight) { return [Math.round((x * toWidth) / fromWidth), Math.round((y * toHeight) / fromHeight)]; }
// Example: Scale from 1024x768 to actual 1920x1080 const [scaledX, scaledY] = scaleCoordinates( 500, 400, // Claude's coordinates 1024, 768, // Tool configuration 1920, 1080 // Actual display );
</execution_process>
<best_practices>
Sandboxed Execution: ALWAYS run in Docker container or VM with minimal privileges. Never grant access to sensitive data, authentication credentials, or unrestricted internet.
Human Confirmation: Implement human-in-the-loop confirmation for meaningful actions like form submissions, file deletions, or external communications.
Prompt Injection Protection: Be aware that malicious content in screenshots can attempt to manipulate Claude. Validate actions against the original task.
Resolution Consistency: Keep display resolution consistent throughout a session. XGA (1024x768) provides best balance of accuracy and visibility.
Screenshot After Actions: Always return a screenshot after each action so Claude can verify the result and determine next steps.
Error Recovery: Implement graceful error handling. If an action fails, capture screenshot and let Claude decide how to proceed.
Rate Limiting: Add delays between rapid actions to allow UI to update. Use the wait action when needed.
Beta Headers: Always include the appropriate beta header for your model version.
</best_practices>
<code_example> Complete Agent Loop Example (Node.js)
const Anthropic = require('@anthropic-ai/sdk');
const anthropic = new Anthropic();
// Tool configuration const computerTool = { type: 'computer_20250124', name: 'computer', display_width_px: 1024, display_height_px: 768, display_number: 1, };
// Main agent loop
async function runComputerUseTask(task) {
console.log(Starting task: ${task});
const messages = [{ role: 'user', content: task }]; let iterations = 0; const maxIterations = 50;
while (iterations < maxIterations) {
iterations++;
console.log(Iteration ${iterations});
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 4096,
tools: [computerTool],
messages,
betas: ['computer-use-2025-01-24'],
});
// Check for completion
if (response.stop_reason === 'end_turn') {
const textBlocks = response.content.filter(b => b.type === 'text');
return textBlocks.map(b => b.text).join('\n');
}
// Process tool calls
const toolResults = [];
for (const block of response.content) {
if (block.type === 'tool_use' && block.name === 'computer') {
console.log(`Action: ${block.input.action}`);
// Execute action and get screenshot
const screenshot = await executeAction(block.input);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: screenshot,
},
},
],
});
}
}
messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });
}
throw new Error('Task did not complete within iteration limit'); }
// Example usage runComputerUseTask('Open the calculator app and compute 25 * 47') .then(result => console.log('Result:', result)) .catch(err => console.error('Error:', err));
</code_example>
<code_example> Python Implementation
import anthropic import base64 from typing import Any
client = anthropic.Anthropic()
COMPUTER_TOOL = { "type": "computer_20250124", "name": "computer", "display_width_px": 1024, "display_height_px": 768, "display_number": 1 }
def run_computer_use_task(task: str, max_iterations: int = 50) -> str: """Execute a computer use task with agent loop.""" messages = [{"role": "user", "content": task}]
for iteration in range(max_iterations):
print(f"Iteration {iteration + 1}")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=[COMPUTER_TOOL],
messages=messages,
betas=["computer-use-2025-01-24"]
)
# Check for completion
if response.stop_reason == "end_turn":
text_blocks = [b.text for b in response.content if b.type == "text"]
return "\n".join(text_blocks)
# Process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use" and block.name == "computer":
print(f"Action: {block.input['action']}")
# Execute action in your environment
screenshot_b64 = execute_action(block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": screenshot_b64
}
}]
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
raise Exception("Max iterations reached")
def execute_action(action_input: dict[str, Any]) -> str: """Execute computer action and return screenshot as base64.""" action = action_input["action"]
# Implement using your automation framework
# (pyautogui, pynput, or container-specific tools)
if action == "screenshot":
pass # Just capture
elif action == "left_click":
x, y = action_input["coordinate"]
# click(x, y)
elif action == "type":
text = action_input["text"]
# type_text(text)
elif action == "key":
key = action_input["text"]
# press_key(key)
# ... handle other actions
# Capture and return screenshot
return capture_screenshot_base64()
</code_example>
<usage_example> Available Actions Reference
// Basic actions (all versions) { "action": "screenshot" } { "action": "left_click", "coordinate": [500, 300] } { "action": "type", "text": "Hello, world!" } { "action": "key", "text": "ctrl+s" } { "action": "mouse_move", "coordinate": [500, 300] }
// Enhanced actions (computer_20250124) { "action": "scroll", "coordinate": [500, 400], "scroll_direction": "down", "scroll_amount": 3 } { "action": "left_click_drag", "start_coordinate": [100, 100], "coordinate": [300, 300] } { "action": "right_click", "coordinate": [500, 300] } { "action": "middle_click", "coordinate": [500, 300] } { "action": "double_click", "coordinate": [500, 300] } { "action": "triple_click", "coordinate": [500, 300] } { "action": "left_mouse_down", "coordinate": [500, 300] } { "action": "left_mouse_up", "coordinate": [500, 300] } { "action": "hold_key", "text": "shift", "duration": 1.0 } { "action": "wait", "duration": 2.0 }
// Opus 4.5 only (computer_20251124 with enable_zoom: true) { "action": "zoom", "coordinate": [500, 300], "zoom_direction": "in", "zoom_amount": 2 }
</usage_example>
<usage_example> Docker Reference Container
Pull and run the Anthropic reference container
docker pull ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Run with API key
docker run -it --rm
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY
-v $(pwd)/output:/home/computeruse/output
-p 5900:5900
-p 8501:8501
-p 6080:6080
-p 8080:8080
ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Access points:
- VNC: localhost:5900 (password: "secret")
- noVNC web: http://localhost:6080/vnc.html
- Streamlit UI: http://localhost:8501
- API: http://localhost:8080
</usage_example>
Tool Versions
Version Beta Header Model Support Key Features
computer_20250124
computer-use-2025-01-24
Sonnet, Haiku Enhanced actions (scroll, drag, wait, etc.)
computer_20251124
computer-use-2025-11-24
Opus 4.5 only Zoom action (requires enable_zoom: true )
Security Requirements
CRITICAL: Sandboxing is Mandatory
Computer Use provides direct control over a computer environment. NEVER run without proper sandboxing:
-
Use dedicated containers/VMs - Never on host machines with sensitive data
-
Minimal privileges - No root access, limited filesystem access
-
Network isolation - Restrict or block internet access
-
No credentials - Never expose API keys, passwords, or tokens in the environment
-
Human oversight - Require confirmation for destructive or external actions
Prompt Injection Risks
Malicious content displayed on screen can attempt to manipulate Claude:
-
Validate that actions align with the original task
-
Implement allowlists for permitted applications/websites
-
Monitor for suspicious instruction patterns in screenshots
Error Handling
Error Cause Resolution
invalid_request_error
Missing beta header Add betas: ["computer-use-2025-01-24"]
tool_use_error
Invalid coordinates Ensure coordinates within display bounds
rate_limit_error
Too many requests Implement exponential backoff
Action has no effect UI not ready Add wait action before retrying
Wrong element clicked Coordinate drift Re-capture screenshot and recalculate
Integration with Agents
Primary Agents
-
developer: Automated testing, UI verification
-
qa: End-to-end testing, visual regression
-
devops-troubleshooter: System debugging, log inspection
Use Cases
-
Automated form filling and data entry
-
Application testing and QA automation
-
Desktop application interaction
-
Browser automation (when headless won't work)
-
Legacy system integration
-
Visual verification and screenshots
Memory Protocol (MANDATORY)
Before starting:
cat .claude/context/memory/learnings.md
Check for:
-
Previous computer use configurations
-
Known automation patterns
-
Environment-specific settings
After completing:
-
New pattern discovered -> .claude/context/memory/learnings.md
-
Security concern found -> .claude/context/memory/issues.md
-
Architecture decision -> .claude/context/memory/decisions.md
ASSUME INTERRUPTION: If it's not in memory, it didn't happen.
Related
-
Anthropic Documentation: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
-
Reference Implementation: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo
-
API Reference: https://docs.anthropic.com/en/api/computer-use