Phoenix Scraper
Resilient three-tier failover scraper. Never returns empty — if one method fails, the next activates automatically.
Failover Chain
Tier 1: Brave Search API (fast, free tier, 2k req/month)
↓ (on block/empty/timeout)
Tier 2: Bright Data Web Unlocker (residential proxy, JS-render optional)
↓ (on block/429/timeout)
Tier 3: Playwright headless browser (full JS execution)
Quick Start
from scripts.phoenix_scraper import scrape
# Basic fetch
result = scrape("https://example.com/page")
# With JS rendering (for SPA/dynamic sites)
result = scrape("https://example.com/page", render_js=True)
# With specific Bright Data zone
result = scrape("https://linkedin.com/jobs/...", zone="job_search_scraper")
Zone Routing
| Use Case | Zone |
|---|---|
| Job boards (LinkedIn, Glassdoor, Reed, Indeed) | job_search_scraper |
| Social media, news, general web | web_unlocker |
| X.com / Twitter | Use X API v2 (see references/x-api.md) |
Bright Data render_js
Set render_js=True for JS-heavy sites (CWJobs, TotalJobs, ContractorUK). Adds "render": True (boolean) to payload and uses 60s timeout.
Critical: Use boolean True, not string "html" — Bright Data validation rejects strings.
Bright Data Premium Domains (Cost Note)
LinkedIn, Glassdoor, and other heavily-protected job boards may be classified as Premium Domains in your Bright Data zone (updated quarterly). API call syntax is identical — but cost per request is higher. Check your zone's Premium Domains list if costs spike unexpectedly.
Playwright Stealth (2026 Enhancement)
For Tier 3, consider installing playwright-stealth to patch headless browser fingerprints — reduces detection on Cloudflare/advanced bot-protected sites:
pip install playwright-stealth
# Optional enhancement in phoenix_scraper.py Tier 3:
from playwright_stealth import stealth_sync
stealth_sync(page)
The base Playwright tier works without this, but stealth patching significantly improves success rates on heavily protected sites (Coupang, Naver, etc.) as of 2026.
URL Formatting
- CWJobs/TotalJobs: use hyphen-slugs —
finance-systems-consultantNOTfinance+systems+consultant - Glassdoor:
https://www.glassdoor.co.uk/Job/united-kingdom-{slug}-jobs-SRCH_IL.0,14_IN2_KO15,{end}.htm
Environment Variables
BRIGHT_DATA_API_KEY=<key> # Bright Data API key
BRIGHT_DATA_ZONE=job_search_scraper # Default zone (override per-call)
BRAVE_API_KEY=<key> # Brave Search API key
X_BEARER_TOKEN=<token> # X API v2 bearer token (for X.com)
X.com Monitoring
For X/Twitter, use X API v2 (not scraping). See references/x-api.md for endpoint details and rate limits.
Error Handling
All tiers log failures before escalating. On total failure, returns {"success": False, "html": "", "method": "all_failed", "error": "<reason>"}.
Never raises exceptions — always returns a result dict.