virtual-desktop-pro

Persistent authenticated browser for OpenClaw via kasmweb/chrome Docker sidecar. Principal logs in once via noVNC — sessions saved permanently in Docker volume. Agent navigates any website, clicks, fills forms, extracts data, uploads files, takes screenshots, solves CAPTCHAs autonomously, and analyses pages with Claude Vision. Use when the task requires a real authenticated browser, not a static fetch.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "virtual-desktop-pro" with this command: npx skills add georges91560/virtual-desktop-pro

Virtual Desktop — Authenticated Browser Layer

What this skill does

Gives the agent a persistent authenticated browser (kasmweb/chrome) running as a Docker sidecar. Principal logs in once via noVNC. Sessions saved permanently.

CapabilityWhat it means
ANALYZERead any page, extract structured data, monitor changes over time
PLANMap the UI, identify selectors, prepare multi-step action sequences
EXECUTEClick, type, fill forms, submit, upload, download, navigate any flow
SELF-CORRECTScreenshot error state, identify root cause, retry with alternate approach
IMPROVEWrite UI patterns and selector maps to .learnings/ after every session

Use cases: Google Workspace · social platforms · admin dashboards · e-commerce · forms · market research · data extraction · any platform with or without an API


Workspace Structure

/workspace/
├── screenshots/          ← visual proof of every action (auto-created)
├── logs/browser/         ← full tracebacks (auto-created)
├── tasks/lessons.md      ← immediate task capture during mission
├── AUDIT.md              ← append-only action log
├── memory/YYYY-MM-DD.md  ← daily session summary
└── .learnings/
    ├── ERRORS.md         ← errors, broken selectors, ref maps
    └── LEARNINGS.md      ← patterns, timing, navigation per platform

When to Use

Use this skill when the task requires a real authenticated browser:

  • Pages requiring login (Google, social networks, dashboards, admin panels)
  • JS-rendered pages where static fetch returns nothing useful
  • Multi-step flows: forms, checkouts, confirmations, file uploads
  • Platforms without an API
  • Screenshots or visual evidence of a page state
  • CAPTCHA-protected pages

Prefer a lighter path first — if a simple HTTP request or existing OpenClaw tool can answer the question, use that instead. This skill uses more tokens and resources than plain fetch.


Architecture

This skill runs a persistent kasmweb/chrome Docker sidecar alongside OpenClaw. Principal logs in once via noVNC (port 6901). Sessions saved permanently in a Docker volume.

Three execution paths — load only what the task needs:

PathWhen to useFile
OpenClaw native browserSimple navigate/click/extract — fastest, fewest tokensBuilt-in
browser_control.pyAUDIT logging, workflows, CAPTCHA, Visionbrowser_control.py
noVNC (manual)Initial login, 2FA, session renewalPort 6901

Load only the smallest path needed. Simple navigation → OpenClaw native. Complex multi-step with logging → browser_control.py.


Setup — Run Once

OPENCLAW_DIR="${OPENCLAW_DIR:-$(pwd)}"
cd "$OPENCLAW_DIR"
CONTAINER="${OPENCLAW_CONTAINER:-$(docker ps --format '{{.Names}}' | grep openclaw | head -1)}"

# 1. Add kasmweb/chrome to docker-compose.yml
python3 -c "
import yaml, os
VNC_PW = os.environ.get('VNC_PW') or __import__('secrets').token_urlsafe(18)
with open('docker-compose.yml') as f:
    data = yaml.safe_load(f)
data.setdefault('services', {})['browser'] = {
    'image': 'kasmweb/chrome:1.15.0',
    'container_name': 'browser',
    'restart': 'unless-stopped',
    'shm_size': '1gb',
    'ports': ['6901:6901', '9222:9222'],
    'environment': [
        'VNC_PW=' + VNC_PW,
        'RESOLUTION=1920x1080',
        'CHROME_ARGS=--remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 --no-sandbox --disable-blink-features=AutomationControlled --disable-infobars'
    ],
    'volumes': ['browser-profile:/home/kasm-user/chrome-profile'],
    'networks': list(data.get('networks', {'default': None}).keys())
}
data.setdefault('volumes', {})['browser-profile'] = None
with open('docker-compose.yml', 'w') as f:
    yaml.dump(data, f, default_flow_style=False, allow_unicode=True)
print('docker-compose.yml updated')
"

# 2. Update .env
# VNC_PW — generate a strong random password if not already set
if ! grep -q "VNC_PW" .env 2>/dev/null; then
  VNC_GENERATED=$(python3 -c "import secrets,string;     print(''.join(secrets.choice(string.ascii_letters+string.digits) for _ in range(24)))")
  echo "VNC_PW=${VNC_GENERATED}" >> .env
  echo "✅ VNC_PW generated — save this: ${VNC_GENERATED}"
fi
grep -q "BROWSER_CDP_URL"     .env || echo "BROWSER_CDP_URL=http://browser:9222" >> .env
grep -q "CAPSOLVER_API_KEY"   .env || echo "CAPSOLVER_API_KEY="                  >> .env
grep -q "BROWSERBASE_API_KEY" .env || echo "BROWSERBASE_API_KEY="                >> .env

# 3. Update openclaw.json — hot reload, no restart needed
python3 -c "
import json, os
f = 'data/.openclaw/openclaw.json'
with open(f) as fp: cfg = json.load(fp)
cfg.setdefault('browser', {}).update({'enabled': True, 'headless': False,
    'noSandbox': True, 'defaultProfile': 'chrome-sidecar'})
profiles = cfg['browser'].setdefault('profiles', {})
profiles['chrome-sidecar'] = {'cdpUrl': 'http://browser:9222', 'color': '#4285F4'}
bb_key = os.environ.get('BROWSERBASE_API_KEY', '')
if bb_key:
    profiles['browserbase'] = {'cdpUrl': f'wss://connect.browserbase.com?apiKey={bb_key}', 'color': '#F97316'}
with open(f, 'w') as fp: json.dump(cfg, fp, indent=2)
print('openclaw.json updated — hot reload active')
"

# 4. Start browser container only — OpenClaw keeps running
docker compose up -d --no-deps browser
sleep 12

# 5. Install Python dependencies
docker exec "$CONTAINER" pip install requests playwright --break-system-packages -q
docker exec "$CONTAINER" node /app/node_modules/playwright-core/cli.js install chromium
echo "✅ Python dependencies installed"

# 6. Download CapSolver extension (optional — only if key present)
CAPSOLVER_KEY=$(grep CAPSOLVER_API_KEY .env | cut -d= -f2)
if [ -n "$CAPSOLVER_KEY" ]; then
  docker exec "$CONTAINER" bash -c "
  apt-get install -y unzip curl -qq
  curl -sL https://github.com/capsolver/capsolver-browser-extension/releases/latest/download/chrome.zip \
    -o /tmp/capsolver.zip
  unzip -q /tmp/capsolver.zip -d /data/.openclaw/capsolver-extension
  sed -i \"s/apiKey: \\\"\\\"/apiKey: \\\"$CAPSOLVER_KEY\\\"/\" \
    /data/.openclaw/capsolver-extension/assets/config.js 2>/dev/null
  "
  echo "✅ CapSolver extension configured"
fi

# 7. Create workspace directories and deploy browser_control.py
docker exec "$CONTAINER" bash -c "
mkdir -p /data/.openclaw/workspace/skills/virtual-desktop
mkdir -p /workspace/screenshots /workspace/logs/browser /workspace/.learnings /workspace/memory
touch /workspace/AUDIT.md /workspace/.learnings/ERRORS.md /workspace/.learnings/LEARNINGS.md
"
docker cp {baseDir}/browser_control.py \
  "$CONTAINER":/data/.openclaw/workspace/skills/virtual-desktop/browser_control.py
echo "✅ browser_control.py deployed"

# 8. Verify
docker ps | grep -E "openclaw|browser"
curl -s http://localhost:9222/json > /dev/null && echo "✅ Chrome CDP active" || echo "⏳ Chrome starting"
docker exec "$CONTAINER" \
  python3 /data/.openclaw/workspace/skills/virtual-desktop/browser_control.py status

# 9. Notify principal
VPS_IP=$(curl -s ifconfig.me 2>/dev/null || echo "YOUR_VPS_IP")
echo "Virtual Desktop ready — https://${VPS_IP}:6901"
echo "Log in to your platforms via noVNC then reply DONE."

Initial Login — Once Per Platform

https://YOUR_VPS_IP:6901   login: kasm_user   password: your VNC_PW

Open Chrome via noVNC and log in to every platform you want the agent to access. Sessions saved in Docker volume browser-profile — survive restarts — valid indefinitely.

Step by step — do this once after setup:

1. Open https://YOUR_VPS_IP:6901 in your browser
2. Enter password: your VNC_PW value from .env
3. Chrome Desktop opens inside the browser

4. Log in to Google (accounts.google.com)
   → Email + password + 2FA if required
   → "Trust this device" → YES
   → This unlocks: Gmail, Drive, Calendar, Docs,
     Sheets, Google AI Studio, YouTube, all Google services

5. Log in to every other platform you want Wesley to access:
   → Twitter/X        → twitter.com
   → LinkedIn         → linkedin.com
   → Reddit           → reddit.com
   → Hostinger panel  → hpanel.hostinger.com
   → Any other site   → log in normally

6. After each login: Chrome saves the session automatically
   in the Docker volume browser-profile

7. Reply DONE to Wesley on Telegram
   → Wesley confirms sessions are active
   → He will never ask for your credentials again

What happens after:

Wesley opens any platform → already logged in ✅
No credentials needed → ever again
Session expires (rare) → Wesley notifies Telegram
  → You open noVNC → log in again → reply DONE
  → Takes 2 minutes

Important — 2FA:

Google 2FA → confirm once via noVNC
              Chrome remembers the device
              No 2FA required again on this browser

Other platforms → same principle
                  confirm once → trusted device → done

Quick Reference

ReferenceContent
OpenClaw native browser commandsSee below — openclaw browser
browser_control.py commandsSee below — $BC
CAPTCHA strategySee CAPTCHA section
Residential proxySee Proxy section
Claude VisionSee Vision section
Selectors, timing, auth flowsLEARNINGS.md (auto-built by agent)
Broken selectors, error recoveryERRORS.md (auto-built by agent)

OpenClaw Native Browser — Fastest Path

# Navigation
openclaw browser open <url>
openclaw browser snapshot [--interactive]
openclaw browser back | forward | reload | close

# Interaction
openclaw browser click <ref>
openclaw browser type <ref> "text"
openclaw browser select <ref> "value"
openclaw browser hover <ref>
openclaw browser scroll [--direction down|up|right|left]

# Files
openclaw browser upload /tmp/file.pdf
openclaw browser download <ref> file.pdf

# Cookies & storage
openclaw browser cookies | cookies set k v --url "https://example.com" | cookies clear
openclaw browser storage local get | set k v | clear

# Configuration
openclaw browser set geo 48.8566 2.3522 --origin "https://example.com"
openclaw browser set timezone Europe/Paris
openclaw browser set locale fr-FR
openclaw browser set device "iPhone 14"
openclaw browser set media dark
openclaw browser set headers --headers-json '{"X-Custom":"val"}'

# Debug
openclaw browser console --level error
openclaw browser requests --filter api
openclaw browser trace start | stop
openclaw browser status

# Stealth (if site blocks VPS)
openclaw browser --browser-profile browserbase open <url>

browser_control.py — With AUDIT Logging + CAPTCHA + Vision

BC="python3 /data/.openclaw/workspace/skills/virtual-desktop/browser_control.py"

$BC screenshot  <url> [label]
$BC navigate    <url> [selector]
$BC click       <url> <selector>
$BC click_xy    <url> <x> <y>
$BC fill        <url> <selector> <value>
$BC select      <url> <selector> <value>
$BC hover       <url> <selector>
$BC scroll      <url> <direction> [pixels]
$BC keyboard    <url> <selector> <key>
$BC extract     <url> <selector> [output_file]
$BC wait_for    <url> <selector> [timeout_ms]
$BC upload      <url> <file_selector> <file_path>
$BC analyze     <url_or_image> [question]     ← Claude Vision
$BC captcha     <url>                         ← Autonomous CAPTCHA
$BC workflow    <json_steps_file>             ← Multi-step workflow
$BC status

Workflow JSON Format

[
  { "action": "goto",       "target": "https://TARGET_URL" },
  { "action": "captcha" },
  { "action": "analyze",    "value": "Identify the key elements on this page" },
  { "action": "wait_for",   "target": ".loaded", "timeout_ms": 5000 },
  { "action": "fill",       "target": "#field",  "value": "text" },
  { "action": "click",      "target": "#btn" },
  { "action": "click_xy",   "x": 960, "y": 540 },
  { "action": "scroll",     "direction": "down" },
  { "action": "hover",      "target": "#menu" },
  { "action": "select",     "target": "#list",   "value": "option" },
  { "action": "keyboard",   "target": "#input",  "value": "Enter" },
  { "action": "extract",    "target": ".data",   "value": "/workspace/tasks/out.json" },
  { "action": "screenshot" },
  { "action": "wait",       "value": "2" }
]

CAPTCHA — Autonomous Strategy

1. Auto-detection on every page load
   → reCAPTCHA v2/v3, hCaptcha, Cloudflare Turnstile

2. CapSolver API (if CAPSOLVER_API_KEY set)
   → Extracts sitekey → API → token → injects → continues
   → ~$0.001 per CAPTCHA

3. Cloudflare Turnstile
   → CapSolver Chrome extension handles in background → waits 60s

4. Fallback — if CapSolver fails or key not set
   → Screenshot → Telegram → principal opens noVNC → solves → agent continues

Proxy — If Site Blocks the VPS

# Browserbase — CAPTCHA + stealth + residential proxy built-in
# Free tier: 1 concurrent session, 1h/month — browserbase.com
# Add BROWSERBASE_API_KEY to .env
openclaw browser --browser-profile browserbase open <url>

# Custom proxy
# Add PROXY_URL=http://user:pass@proxy:port to .env
# browser_control.py reads it automatically via get_browser()

Claude Vision — Analyse Images and Pages

# Web page → auto screenshot + analysis
$BC analyze https://example.com "What does this page sell?"

# AI-generated image
$BC analyze https://site.com/image.png "Describe the visual elements"

# Existing screenshot
$BC analyze /workspace/screenshots/capture.png "Is there a form here?"

# Inside a workflow
{ "action": "analyze", "value": "Identify all form fields" }

Execution Protocol

BEFORE EVERY ACTION:
  1. Log to AUDIT.md: "BEFORE [action] on [url]"
  2. Detect CAPTCHA → resolve automatically if present
  3. Execute action
  4. Screenshot as proof
  5. Log to AUDIT.md: "OK/FAILED [action]"
  6. Telegram report if real-world consequences

NEVER:
  → Access platforms not authorized by the principal
  → Execute payments or destructive actions without explicit approval
  → Fail silently — always log
  → Retry more than 3 times without alerting the principal

Browser Traps

Avoid these common mistakes:

  • Guessing selectors from source → use snapshot --interactive or codegen to discover stable refs
  • Using force: true before understanding why → investigate the overlay/disabled state first
  • Driving a full browser when HTTP would work → more cost, more flake, less signal
  • Sharing one session across parallel tasks that mutate state → failures become order-dependent
  • Waiting on networkidle for chatty SPAs → analytics, polling, or sockets keep the page "busy" even when the UI is ready
  • Retrying the same selector 10 times → log to ERRORS.md and alert the principal instead
  • Accessing high-stakes flows (payments, production data) without explicit confirmation → require approval first

Error Recovery

CAPTCHA          → CapSolver auto → fallback noVNC
CLOUDFLARE       → switch to --browser-profile browserbase
SESSION EXPIRED  → Telegram → principal opens noVNC → reconnects
ELEMENT MISSING  → use analyze to understand the new layout
                 → log to .learnings/ERRORS.md with ref map
TIMEOUT          → check /workspace/logs/browser/YYYY-MM-DD.log

Files Written

FileWhenContent
/workspace/AUDIT.mdEvery actionBefore + after log, append-only
/workspace/screenshots/YYYY-MM-DD_*.pngEvery actionVisual proof
/workspace/screenshots/*_analysis.txtAfter analyzeVision result
/workspace/logs/browser/YYYY-MM-DD.logOn exceptionFull traceback
/workspace/.learnings/ERRORS.mdOn failureErrors + broken selectors
/workspace/.learnings/LEARNINGS.mdOn discoveryPatterns + timing per platform
/workspace/tasks/lessons.mdDuring missionImmediate task capture
/workspace/memory/YYYY-MM-DD.mdDailySession summary

This skill does NOT:

  • Create files outside the paths listed above
  • Persist sessions or credentials beyond the Docker volume
  • Make undeclared network requests beyond the target sites and optional services above
  • Access platforms not explicitly authorized by the principal

Self-Improvement

Write immediately after every session — do not batch:

# ERRORS.md — on failure
## [YYYY-MM-DD] [Platform] — [Title]
**Priority**: low|medium|high   **Status**: pending|resolved
**What happened**: ...   **Root cause**: ...   **Fix**: ...   **Ref map**: {"old_ref":"new_ref"}

# LEARNINGS.md — on discovery
## [YYYY-MM-DD] [Platform] — [Pattern]
**Category**: navigation|interaction|timing|auth_flow|captcha|vision
**Discovery**: ...   **Usage**: ...

Security

This skill opens port 6901 (noVNC) and stores authenticated browser sessions permanently.

REQUIRED before running:
  1. Set a strong VNC_PW in .env — never use the default
  2. Firewall port 6901 to your IP only:
     Hostinger → Panel → VPS → Firewall → restrict 6901 to your IP
     Or use SSH tunnel: ssh -L 6901:localhost:6901 user@YOUR_VPS_IP
  3. Only log in to accounts you trust the agent to access
  4. Optional keys (CapSolver, Browserbase, Anthropic) send data to
     those services — only add them if you trust and accept their costs

External Endpoints

EndpointData sentPurpose
Any URL the principal authorizesBrowser requests, cookies, form dataAutomation
http://browser:9222CDP protocol — internal onlyBrowser control
https://api.capsolver.comCAPTCHA sitekey + page URLCAPTCHA solving (optional)
wss://connect.browserbase.comBrowser sessionStealth proxy (optional)
https://api.anthropic.comScreenshot base64Claude Vision (optional)
https://registry.npmjs.orgPackage metadataPlaywright install only

No other data is sent externally.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

Virtual Desktop — Universal Browser Execution

Full Computer Use for OpenClaw via kasmweb/chrome Docker sidecar. Navigate any website, click, type, fill forms, extract data, upload files, screenshot on an...

Registry SourceRecently Updated
1041Profile unavailable
Automation

AutoClaw Browser Automation

Complete browser automation skill with MCP protocol support and Chrome extension

Registry SourceRecently Updated
3640Profile unavailable
Automation

Browser Cash

Spin up unblocked browser sessions via Browser.cash for web automation. Sessions bypass anti-bot protections (Cloudflare, DataDome, etc.) making them ideal for scraping and automation.

Registry SourceRecently Updated
3.2K3Profile unavailable
Automation

WSL Chrome CDP

自动检测并启动 Windows Chrome 调试模式,实现 WSL2 环境下对 Chrome 浏览器的无缝远程控制。

Registry SourceRecently Updated
1130Profile unavailable