web-scraper

Firecrawl Web Scraper

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "web-scraper" with this command: npx skills add tryshift-sh/skills-store/tryshift-sh-skills-store-web-scraper

Firecrawl Web Scraper

Use this managed skill when the user wants to scrape web pages, crawl websites, discover URLs, search the web, or extract structured data from websites.

This skill uses Shift's local Skill Router. Do not ask the user to paste credentials into chat.

Invocation

Send a POST request to:

${SHIFT_LOCAL_GATEWAY}/skill-router/invoke

Scrape a single page

Extracts content from a URL as clean markdown.

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "scrape", "input": { "url": "https://example.com" } }

Optional input fields:

  • formats : array of output formats, default ["markdown"] . Options: markdown , html , links , screenshot

  • onlyMainContent : boolean, default true . Exclude nav/footer

  • includeTags / excludeTags : arrays of HTML tags to include or exclude

  • waitFor : integer, milliseconds to wait before scraping

  • timeout : integer, milliseconds, default 30000

  • mobile : boolean, default false . Emulate mobile viewport

  • location : object with country (ISO 3166-1 alpha-2) and languages array

Crawl a website (async)

Starts an async crawl job. Returns a job ID to poll with crawl_status .

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "crawl", "input": { "url": "https://example.com", "limit": 50 } }

Optional input fields:

  • limit : max pages to crawl, default 10

  • maxDepth : crawl depth

  • includePaths / excludePaths : regex path filters

  • allowExternalLinks : boolean, default false

  • allowSubdomains : boolean, default false

  • scrapeOptions : object with same options as scrape (e.g. {"formats": ["markdown"]} )

Check crawl status

Poll for crawl job results using the ID returned from crawl.

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "crawl_status", "input": { "crawlId": "crawl-job-uuid" } }

Status values: scraping , completed , failed .

Map a website's URLs

Discovers all URLs on a site without scraping content. Use this to understand site structure before crawling.

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "map", "input": { "url": "https://example.com" } }

Optional input fields:

  • search : order results by relevance to this query

  • limit : max URLs, default 5000

  • includeSubdomains : boolean, default true

  • ignoreSitemap : boolean, default false

Search the web

Search the web and optionally scrape the result pages.

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "search", "input": { "query": "latest AI research papers" } }

Optional input fields:

  • limit : max results, 1-100, default 5

  • lang : language code

  • country : country code, default US

  • tbs : time filter (qdr:h hour, qdr:d day, qdr:w week, qdr:m month, qdr:y year)

  • scrapeOptions : object to control content extraction from results

Extract structured data (async)

Extracts structured data from URLs using a prompt and/or JSON schema. Returns a job ID to poll with extract_status .

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "extract", "input": { "urls": ["https://example.com/pricing"], "prompt": "Extract all pricing plans with name, price, and features" } }

Optional input fields:

  • schema : JSON Schema defining expected output structure

  • enableWebSearch : boolean, default false

Check extract status

Poll for extract job results.

{ "skillProvider": "firecrawl", "skill": "web-scraper", "action": "extract_status", "input": { "extractId": "extract-job-uuid" } }

Authentication

This skill requires a Firecrawl API key configured in Shift. Get one at https://www.firecrawl.dev.

Do not ask the user to paste raw credentials into the conversation. Shift handles authentication automatically when the required connection is configured.

Agent behavior

  • Prefer scrape for single pages. Use crawl only when multiple pages are needed.

  • Use map first to discover site structure before starting a large crawl.

  • For async actions (crawl , extract ), poll status every few seconds until completed or failed .

  • Default to markdown format unless the user specifically needs HTML or screenshots.

  • When scraping fails, suggest the user check the URL or try with waitFor for JavaScript-heavy pages.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

post

No summary provided by upstream source.

Repository SourceNeeds Review
General

square-post

No summary provided by upstream source.

Repository SourceNeeds Review
General

user

No summary provided by upstream source.

Repository SourceNeeds Review
General

retweet

No summary provided by upstream source.

Repository SourceNeeds Review