Firecrawl Web Scraper
Use this managed skill when the user wants to scrape web pages, crawl websites, discover URLs, search the web, or extract structured data from websites.
This skill uses Shift's local Skill Router. Do not ask the user to paste credentials into chat.
Invocation
Send a POST request to:
${SHIFT_LOCAL_GATEWAY}/skill-router/invoke
Scrape a single page
Extracts content from a URL as clean markdown.
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "scrape", "input": { "url": "https://example.com" } }
Optional input fields:
-
formats : array of output formats, default ["markdown"] . Options: markdown , html , links , screenshot
-
onlyMainContent : boolean, default true . Exclude nav/footer
-
includeTags / excludeTags : arrays of HTML tags to include or exclude
-
waitFor : integer, milliseconds to wait before scraping
-
timeout : integer, milliseconds, default 30000
-
mobile : boolean, default false . Emulate mobile viewport
-
location : object with country (ISO 3166-1 alpha-2) and languages array
Crawl a website (async)
Starts an async crawl job. Returns a job ID to poll with crawl_status .
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "crawl", "input": { "url": "https://example.com", "limit": 50 } }
Optional input fields:
-
limit : max pages to crawl, default 10
-
maxDepth : crawl depth
-
includePaths / excludePaths : regex path filters
-
allowExternalLinks : boolean, default false
-
allowSubdomains : boolean, default false
-
scrapeOptions : object with same options as scrape (e.g. {"formats": ["markdown"]} )
Check crawl status
Poll for crawl job results using the ID returned from crawl.
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "crawl_status", "input": { "crawlId": "crawl-job-uuid" } }
Status values: scraping , completed , failed .
Map a website's URLs
Discovers all URLs on a site without scraping content. Use this to understand site structure before crawling.
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "map", "input": { "url": "https://example.com" } }
Optional input fields:
-
search : order results by relevance to this query
-
limit : max URLs, default 5000
-
includeSubdomains : boolean, default true
-
ignoreSitemap : boolean, default false
Search the web
Search the web and optionally scrape the result pages.
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "search", "input": { "query": "latest AI research papers" } }
Optional input fields:
-
limit : max results, 1-100, default 5
-
lang : language code
-
country : country code, default US
-
tbs : time filter (qdr:h hour, qdr:d day, qdr:w week, qdr:m month, qdr:y year)
-
scrapeOptions : object to control content extraction from results
Extract structured data (async)
Extracts structured data from URLs using a prompt and/or JSON schema. Returns a job ID to poll with extract_status .
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "extract", "input": { "urls": ["https://example.com/pricing"], "prompt": "Extract all pricing plans with name, price, and features" } }
Optional input fields:
-
schema : JSON Schema defining expected output structure
-
enableWebSearch : boolean, default false
Check extract status
Poll for extract job results.
{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "extract_status", "input": { "extractId": "extract-job-uuid" } }
Authentication
This skill requires a Firecrawl API key configured in Shift. Get one at https://www.firecrawl.dev.
Do not ask the user to paste raw credentials into the conversation. Shift handles authentication automatically when the required connection is configured.
Agent behavior
-
Prefer scrape for single pages. Use crawl only when multiple pages are needed.
-
Use map first to discover site structure before starting a large crawl.
-
For async actions (crawl , extract ), poll status every few seconds until completed or failed .
-
Default to markdown format unless the user specifically needs HTML or screenshots.
-
When scraping fails, suggest the user check the URL or try with waitFor for JavaScript-heavy pages.