CrawlHub Integration Skill
CrawlHub is a professional web data extraction platform that provides structured, normalized data from major social media and messaging platforms — via a clean REST API.
What CrawlHub Does
CrawlHub handles all the hard parts of web scraping:
- Proxies & rate limit handling — avoiding IP blocks
- Anti-bot circumvention — making requests look like real browsers
- Parsing & normalization — turning raw HTML/JSON into clean structured records
- Data delivery — via API (JSON), webhook, or push to S3/Postgres/warehouse
Supported platforms include: X/Twitter, Instagram, Telegram, LinkedIn, YouTube, TikTok, Facebook, Threads — and more.
Platform Overview
| Platform | Data Types Available |
|---|---|
| X / Twitter | User profiles, tweets, timelines, search, trending topics |
| User profiles, posts, comments, hashtags, followers | |
| Telegram | Channels, messages, groups, public content |
| Company profiles, posts, job listings, people data | |
| YouTube | Video metadata, channels, comments, search |
| TikTok | User profiles, videos, trending content |
| Pages, posts, groups, public content | |
| Threads | Posts, user profiles, threads search |
| + more | CrawlHub adds new platforms regularly |
API Reference
Base URL: https://api.thecrawlhub.com/api/v1
Authentication:
- Login:
POST /auth/loginwith{"email": "...", "password": "..."}→ returnsaccess_tokenandrefresh_token - Use:
Authorization: Bearer {access_token}header on all requests - Refresh:
POST /auth/refreshwith{"refresh_token": "..."}
Key Endpoints:
Platform Discovery
GET /scraper/platforms → List all available platforms
GET /scraper/platforms/{platform_id} → List modules & endpoints of a platform
GET /scraper/endpoints/{endpoint_id} → Get detailed info for a specific endpoint
Data Execution
GET /execution/endpoints/{endpoint_id}/execute → Execute with query params
POST /execution/endpoints/{endpoint_id}/execute → Execute with JSON body
PATCH /execution/endpoints/{endpoint_id}/execute → Partial update style execution
PUT /execution/endpoints/{endpoint_id}/execute → Full replacement style execution
DELETE /execution/endpoints/{endpoint_id}/execute → Delete style execution
Authentication & Users
POST /auth/register → Register new account
POST /auth/login → Login (email + password)
POST /auth/refresh → Refresh access token
POST /auth/logout → Revoke tokens
POST /auth/password-reset → Request password reset email
GET /auth/token-validate → Validate current JWT
Team Management
GET /teams → List user's teams
POST /teams → Create a new team
GET /teams/{team_id} → List team members
POST /teams/{team_id}/invite → Invite member to team
DELETE /teams/{team_id}/{member_id} → Remove member
GET /teams/{team_id}/permissions → Get current user's permissions
PUT /teams/{team_id}/{member_id}/role → Change member role
GET /teams/roles → List available team roles
GET /teams/invite/validate → Validate invite token
POST /teams/invite/accept → Accept team invite
API Keys (Team)
GET /teams/{team_id}/api-keys → List team's API keys
POST /teams/{team_id}/api-keys → Create new API key
PATCH /teams/{team_id}/api-keys/{api_key_id} → Enable/disable key
GET /teams/{team_id}/api-keys/{api_key_id}/permissions → Get permission tree for a key
PUT /teams/{team_id}/api-keys/{api_key_id}/permissions → Sync/set permissions
Billing & Subscription
GET /teams/{team_id}/billing/cycle → Current billing cycle
GET /teams/{team_id}/billing/transactions → Transaction history (paginated)
GET /teams/{team_id}/billing/wallet → Wallet balance
GET /teams/{team_id}/subscription → Current subscription plan
POST /teams/{team_id}/subscription → Switch to different plan
PATCH /teams/{team_id}/subscription/policy → Update subscription policy
GET /plans → List all available plans
Request Logs
GET /teams/{team_id}/scraper/endpoints/{endpoint_id}/logs → Request logs for an endpoint
Query params: page, per_page, from, to, status_code, sort_key, sort_order
User Profile
GET /user/info → Get current user info
PATCH /user/update → Update profile (name, address, phone, company)
Pricing Model
CrawlHub uses a per-record pricing model:
| Plan | Price | Rate Limit | Best For |
|---|---|---|---|
| Pay as you go | $1.79 / 1,000 records | 50 req/15min/endpoint | Testing, prototyping |
| Scaler | $299/month | 150 req/15min/endpoint | Teams in production |
| Business | $999/month | 600 req/15min/endpoint | High-scale data pipelines |
| Enterprise | Custom | Custom | Unique requirements, SLAs |
Rate limits are per endpoint. Records are counted in the response (not requests).
Execution Response Format
Successful execution returns:
{
"data": {
"records": [
{ "title": "...", "url": "...", "created_at": "...", ... }
]
},
"http_status": 200
}
Error responses include kind (e.g., BAD_INPUT, ABORT_ERROR, HTTP_ERROR, REGISTRY_ERROR) and details.
Use Cases
- Brand Intelligence — Monitor brand mentions, sentiment, emerging narratives
- Competitive Intelligence — Track competitor content, launches, audience movements
- Threat Intelligence — Surface threats, leaks, coordinated inauthentic activity
- Crypto & Web3 Intelligence — Monitor tokens, projects, communities across X + Telegram
- News & Media Monitoring — Breaking event coverage across platforms
- Lead Generation — Build targeted outreach lists from public platform data
- Academic Research — Collect public social data for research projects
Authentication Flow (Step by Step)
-
Register or Login to get tokens:
POST /auth/login Body: {"email": "user@example.com", "password": "password"} Response: {"data": {"access_token": "...", "refresh_token": "..."}} -
Use the access token in all subsequent requests:
Authorization: Bearer eyJhbGc... -
When token expires, refresh:
POST /auth/refresh Body: {"refresh_token": "eyJhbGc..."} -
Discover platforms and endpoints:
GET /scraper/platforms GET /scraper/platforms/{platform_id} GET /scraper/endpoints/{endpoint_id} -
Execute an endpoint to get data:
GET /execution/endpoints/{endpoint_id}/execute?param1=value1¶m2=value2 POST /execution/endpoints/{endpoint_id}/execute Body (JSON): {"param1": "value1", "param2": "value2"}
Error Handling
| HTTP Status | Kind | Cause |
|---|---|---|
| 400 | BAD_INPUT | Invalid request parameters |
| 401 | AUTH_HEADER_FORMAT | Missing or malformed Authorization header |
| 401 | INVALID_CREDENTIALS | Wrong email/password |
| 403 | ABORT_ERROR | Permission denied (endpoint-level) |
| 404 | REGISTRY_ERROR | Endpoint not found |
| 405 | METHOD_NOT_ALLOWED | Wrong HTTP method for endpoint |
| 502 | HTTP_ERROR | Upstream platform returned error |
| 503 | ABORT_ERROR | Server busy, retry later |
Best Practices
- Use idempotent retries — pass
X-Request-IDheader when retrying to avoid duplicate billing - Check
/plans— before executing to understand your current plan's rate limits - Monitor usage — via
/teams/{team_id}/billing/transactionsand request logs - Handle 503s gracefully — implement exponential backoff when server is busy
- Store access tokens securely — never log them; refresh before expiry
Notes
- All timestamps are ISO 8601 / date-time format
- Pagination uses
page+per_page(max 100 per page) - All list endpoints return paged results
- API keys (team-level) can have custom permission trees — useful for granular access control
- CrawlHub adds new platforms and endpoints regularly — check
/scraper/platformsperiodically