browserbase-scraper

Scrape Cloudflare-protected websites using Stagehand + Browserbase cloud browsers. Use when the user needs to extract data from websites with bot protection, bypass Cloudflare challenges, or scrape JavaScript-heavy pages that block traditional methods like curl or Playwright stealth. Supports AI-powered extraction via Gemini.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "browserbase-scraper" with this command: npx skills add wirelessjoe/browserbase-scraper-skill

Browserbase Scraper

Bypass Cloudflare and bot protection using Stagehand + Browserbase cloud browsers with AI-powered extraction.

When to Use

  • Website blocks curl/fetch with Cloudflare "Just a moment..." page
  • Playwright headless gets detected and blocked
  • Need structured data extraction from dynamic content
  • Scraping auction sites, marketplaces, or other protected pages

Prerequisites

npm install @browserbasehq/stagehand zod

Required environment variables:

  • BROWSERBASE_API_KEY — from browserbase.com dashboard
  • BROWSERBASE_PROJECT_ID — from browserbase.com
  • GOOGLE_GENERATIVE_AI_API_KEY — for Gemini extraction (or use OpenAI)

Quick Start

import { Stagehand } from '@browserbasehq/stagehand';

const stagehand = new Stagehand({
  env: 'BROWSERBASE',
  apiKey: process.env.BROWSERBASE_API_KEY,
  projectId: process.env.BROWSERBASE_PROJECT_ID,
  model: {
    modelName: 'google/gemini-3-flash-preview',
    apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY,
  },
});

await stagehand.init();
const page = stagehand.context.pages()[0];

// Navigate (Cloudflare bypass is automatic)
await page.goto('https://protected-site.com/search?q=term');
await page.waitForTimeout(5000); // Let page fully load

// AI-powered extraction (instruction-only works best)
const data = await stagehand.extract(`
  Extract all product listings as JSON array:
  [{ "title": "...", "price": 123, "url": "..." }]
  Return ONLY the JSON array.
`);

await stagehand.close();

Key Patterns

1. Instruction-Only Extraction (Recommended)

Schema-based extraction often returns empty. Use natural language instructions instead:

const extraction = await stagehand.extract(`
  Look at this page and extract:
  - All item titles
  - Prices as numbers
  - URLs
  Return as JSON array.
`);

2. Handle Cloudflare Delays

Sometimes the challenge takes longer:

const title = await page.title();
if (title.toLowerCase().includes('moment')) {
  await page.waitForTimeout(10000); // Wait for challenge
}

3. Scroll to Load More

Many sites lazy-load content:

for (let i = 0; i < 5; i++) {
  await page.evaluate(() => window.scrollBy(0, window.innerHeight));
  await page.waitForTimeout(800);
}

4. Parse Extraction Results

The extraction returns a string that needs parsing:

let listings = [];
try {
  const jsonMatch = extraction?.extraction?.match(/\[[\s\S]*\]/);
  if (jsonMatch) listings = JSON.parse(jsonMatch[0]);
} catch (e) {
  console.log('Parse error:', e.message);
}

Browserbase Free Tier Limits

  • 1 concurrent session — cron jobs can conflict with interactive use
  • Sessions auto-close after inactivity
  • Use stagehand.close() to release session immediately

Cron Integration

For scheduled scraping, use OpenClaw cron with isolated sessions:

openclaw cron add \
  --name "Daily Scrape" \
  --cron "0 6 * * *" \
  --session isolated \
  --message "Run: node ~/scripts/scraper.js"

Troubleshooting

IssueSolution
Empty extractionUse instruction-only (no schema), increase wait time
Cloudflare loopWait 10-15s, check if title contains "moment"
Session limitClose other Browserbase sessions, check dashboard
429 errorsWait for session to complete, don't retry immediately

Example: Full Scraper

See scripts/example_scraper.js for a complete working example.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

handdraw-flowchart

Create hand-drawn workflow diagrams from natural-language process descriptions by generating strictly validated Mermaid flowchart, sequenceDiagram, or classD...

Registry SourceRecently Updated
Automation

Find Agent

OceanBus-powered agent and service discovery via Yellow Pages. Use when users want to find someone, look for a service, reach out to an expert, discover anot...

Registry SourceRecently Updated
Automation

Qwen Web Agent

Browser automation for 通义千问 (Qwen) web interface at qianwen.com. Use when the agent needs to ask questions to Qwen AI and get back responses via browser auto...

Registry SourceRecently Updated
Automation

bot File Processor

通用文件处理技能,用于批量重命名和格式转换。当用户需要批量重命名文件(添加前缀/后缀、替换文本、编号重命名、正则表达式重命名)或转换文件格式(图片格式转换、PDF与图片互转、DOCX转PDF、Markdown转PDF)时使用此技能。

Registry SourceRecently Updated