crawlhub

CrawlHub is a professional web data extraction platform that provides structured data from social media and messaging platforms (X/Twitter, Instagram, Telegram, LinkedIn, YouTube, TikTok, Facebook, Threads, and more). Use this skill when you need to research public data, monitor brands, track competitors, gather market intelligence, or build data pipelines from social platforms. Handles API authentication, endpoint discovery, data extraction requests, and result interpretation. For developers building data-driven applications or teams needing social media intelligence.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "crawlhub" with this command: npx skills add wolflabs88/crawlhub-reseller

CrawlHub Integration Skill

CrawlHub is a professional web data extraction platform that provides structured, normalized data from major social media and messaging platforms — via a clean REST API.

What CrawlHub Does

CrawlHub handles all the hard parts of web scraping:

  • Proxies & rate limit handling — avoiding IP blocks
  • Anti-bot circumvention — making requests look like real browsers
  • Parsing & normalization — turning raw HTML/JSON into clean structured records
  • Data delivery — via API (JSON), webhook, or push to S3/Postgres/warehouse

Supported platforms include: X/Twitter, Instagram, Telegram, LinkedIn, YouTube, TikTok, Facebook, Threads — and more.

Platform Overview

PlatformData Types Available
X / TwitterUser profiles, tweets, timelines, search, trending topics
InstagramUser profiles, posts, comments, hashtags, followers
TelegramChannels, messages, groups, public content
LinkedInCompany profiles, posts, job listings, people data
YouTubeVideo metadata, channels, comments, search
TikTokUser profiles, videos, trending content
FacebookPages, posts, groups, public content
ThreadsPosts, user profiles, threads search
+ moreCrawlHub adds new platforms regularly

API Reference

Base URL: https://api.thecrawlhub.com/api/v1

Authentication:

  • Login: POST /auth/login with {"email": "...", "password": "..."} → returns access_token and refresh_token
  • Use: Authorization: Bearer {access_token} header on all requests
  • Refresh: POST /auth/refresh with {"refresh_token": "..."}

Key Endpoints:

Platform Discovery

GET /scraper/platforms                          → List all available platforms
GET /scraper/platforms/{platform_id}             → List modules & endpoints of a platform
GET /scraper/endpoints/{endpoint_id}           → Get detailed info for a specific endpoint

Data Execution

GET  /execution/endpoints/{endpoint_id}/execute     → Execute with query params
POST /execution/endpoints/{endpoint_id}/execute     → Execute with JSON body
PATCH /execution/endpoints/{endpoint_id}/execute    → Partial update style execution
PUT  /execution/endpoints/{endpoint_id}/execute     → Full replacement style execution
DELETE /execution/endpoints/{endpoint_id}/execute    → Delete style execution

Authentication & Users

POST /auth/register       → Register new account
POST /auth/login          → Login (email + password)
POST /auth/refresh        → Refresh access token
POST /auth/logout         → Revoke tokens
POST /auth/password-reset → Request password reset email
GET  /auth/token-validate  → Validate current JWT

Team Management

GET  /teams                        → List user's teams
POST /teams                        → Create a new team
GET  /teams/{team_id}              → List team members
POST /teams/{team_id}/invite       → Invite member to team
DELETE /teams/{team_id}/{member_id} → Remove member
GET  /teams/{team_id}/permissions  → Get current user's permissions
PUT  /teams/{team_id}/{member_id}/role → Change member role
GET  /teams/roles                  → List available team roles
GET  /teams/invite/validate        → Validate invite token
POST /teams/invite/accept          → Accept team invite

API Keys (Team)

GET  /teams/{team_id}/api-keys              → List team's API keys
POST /teams/{team_id}/api-keys              → Create new API key
PATCH /teams/{team_id}/api-keys/{api_key_id} → Enable/disable key
GET  /teams/{team_id}/api-keys/{api_key_id}/permissions → Get permission tree for a key
PUT  /teams/{team_id}/api-keys/{api_key_id}/permissions → Sync/set permissions

Billing & Subscription

GET /teams/{team_id}/billing/cycle          → Current billing cycle
GET /teams/{team_id}/billing/transactions   → Transaction history (paginated)
GET /teams/{team_id}/billing/wallet          → Wallet balance
GET /teams/{team_id}/subscription           → Current subscription plan
POST /teams/{team_id}/subscription          → Switch to different plan
PATCH /teams/{team_id}/subscription/policy  → Update subscription policy
GET /plans                                  → List all available plans

Request Logs

GET /teams/{team_id}/scraper/endpoints/{endpoint_id}/logs  → Request logs for an endpoint
     Query params: page, per_page, from, to, status_code, sort_key, sort_order

User Profile

GET    /user/info    → Get current user info
PATCH  /user/update  → Update profile (name, address, phone, company)

Pricing Model

CrawlHub uses a per-record pricing model:

PlanPriceRate LimitBest For
Pay as you go$1.79 / 1,000 records50 req/15min/endpointTesting, prototyping
Scaler$299/month150 req/15min/endpointTeams in production
Business$999/month600 req/15min/endpointHigh-scale data pipelines
EnterpriseCustomCustomUnique requirements, SLAs

Rate limits are per endpoint. Records are counted in the response (not requests).

Execution Response Format

Successful execution returns:

{
  "data": {
    "records": [
      { "title": "...", "url": "...", "created_at": "...", ... }
    ]
  },
  "http_status": 200
}

Error responses include kind (e.g., BAD_INPUT, ABORT_ERROR, HTTP_ERROR, REGISTRY_ERROR) and details.

Use Cases

  • Brand Intelligence — Monitor brand mentions, sentiment, emerging narratives
  • Competitive Intelligence — Track competitor content, launches, audience movements
  • Threat Intelligence — Surface threats, leaks, coordinated inauthentic activity
  • Crypto & Web3 Intelligence — Monitor tokens, projects, communities across X + Telegram
  • News & Media Monitoring — Breaking event coverage across platforms
  • Lead Generation — Build targeted outreach lists from public platform data
  • Academic Research — Collect public social data for research projects

Authentication Flow (Step by Step)

  1. Register or Login to get tokens:

    POST /auth/login
    Body: {"email": "user@example.com", "password": "password"}
    
    Response: {"data": {"access_token": "...", "refresh_token": "..."}}
    
  2. Use the access token in all subsequent requests:

    Authorization: Bearer eyJhbGc...
    
  3. When token expires, refresh:

    POST /auth/refresh
    Body: {"refresh_token": "eyJhbGc..."}
    
  4. Discover platforms and endpoints:

    GET /scraper/platforms
    GET /scraper/platforms/{platform_id}
    GET /scraper/endpoints/{endpoint_id}
    
  5. Execute an endpoint to get data:

    GET /execution/endpoints/{endpoint_id}/execute?param1=value1&param2=value2
    POST /execution/endpoints/{endpoint_id}/execute
    Body (JSON): {"param1": "value1", "param2": "value2"}
    

Error Handling

HTTP StatusKindCause
400BAD_INPUTInvalid request parameters
401AUTH_HEADER_FORMATMissing or malformed Authorization header
401INVALID_CREDENTIALSWrong email/password
403ABORT_ERRORPermission denied (endpoint-level)
404REGISTRY_ERROREndpoint not found
405METHOD_NOT_ALLOWEDWrong HTTP method for endpoint
502HTTP_ERRORUpstream platform returned error
503ABORT_ERRORServer busy, retry later

Best Practices

  • Use idempotent retries — pass X-Request-ID header when retrying to avoid duplicate billing
  • Check /plans — before executing to understand your current plan's rate limits
  • Monitor usage — via /teams/{team_id}/billing/transactions and request logs
  • Handle 503s gracefully — implement exponential backoff when server is busy
  • Store access tokens securely — never log them; refresh before expiry

Notes

  • All timestamps are ISO 8601 / date-time format
  • Pagination uses page + per_page (max 100 per page)
  • All list endpoints return paged results
  • API keys (team-level) can have custom permission trees — useful for granular access control
  • CrawlHub adds new platforms and endpoints regularly — check /scraper/platforms periodically

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

Mnemos Ic Runtime

Deterministic-first portfolio analyzer — holdings, performance, Sharpe + Sortino, FRED yield curves, bond duration, sector breakdowns, scenario rebalancing —...

Registry SourceRecently Updated
General

Quicklrc Transcribe

Generate synced lyrics or subtitle files (LRC, SRT, WebVTT, ASS, TTML) from any audio/video URL or YouTube link using the QuickLRC AI API.

Registry SourceRecently Updated
General

System Monitor

Monitor system metrics like CPU, memory, disk, and network. Use when user needs to track server performance, set up alerts for high resource usage, monitor u...

Registry SourceRecently Updated
General

System Monitor

Monitor system metrics like CPU, memory, disk, and network. Use when user needs to track server performance, set up alerts for high resource usage, monitor u...

Registry SourceRecently Updated