link-scraper

Fetch and extract content from URLs with automatic summarization. This skill enables the agent to gather information from the web by scraping web pages, extracting main content, and providing concise summaries.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "link-scraper" with this command: npx skills add winsorllc/upgraded-carnival/winsorllc-upgraded-carnival-link-scraper

Link Scraper

Fetch and extract content from URLs with automatic summarization. This skill enables the agent to gather information from the web by scraping web pages, extracting main content, and providing concise summaries.

When to Use

  • User shares a URL and asks "what's this about?"

  • Researching a topic that requires reading online articles

  • Extracting documentation or technical content from websites

  • Getting summaries of blog posts, news articles, or papers

  • Extracting code snippets or examples from web sources

  • Fetching content that the user wants analyzed or discussed

Setup

No additional installation required. Uses built-in Node.js modules andcheerio for HTML parsing.

If cheerio is not available, falls back to basic regex-based extraction.

Usage

Extract a single URL

node /job/.pi/skills/link-scraper/scrape.js "https://example.com/article"

Extract multiple URLs

node /job/.pi/skills/link-scraper/scrape.js "https://example.com/page1" "https://example.com/page2"

Get just the title

node /job/.pi/skills/link-scraper/scrape.js --title "https://example.com"

Get full content (no summary)

node /job/.pi/skills/link-scraper/scrape.js --full "https://example.com"

Extract specific elements (CSS selector)

node /job/.pi/skills/link-scraper/scrape.js --selector "article" "https://example.com/blog"

Output Format

The scraper returns JSON with the following structure:

{ "url": "https://example.com/article", "title": "Article Title", "description": "Brief description of the page...", "content": "Main content extracted from the page...", "wordCount": 500, "links": ["https://example.com/related1", "https://example.com/related2"], "images": ["https://example.com/image1.jpg"], "siteName": "Example Site" }

When summarized:

{ "url": "https://example.com/article", "title": "Article Title", "summary": "A concise 2-3 sentence summary of the article...", "keyPoints": [ "First key point from the article", "Second key point", "Third key point" ], "wordCount": 500, "readTime": "2 min" }

Common Workflows

Quick URL Summary

User: Check out https://github.com/openclaw/openclaw for me Agent: [Uses link-scraper to fetch and summarize]

Research Task

User: Find information about AI agents Agent: [Uses link-scraper to fetch relevant articles, documentation, etc.]

Code Example Extraction

User: How do I use the GitHub API? https://docs.github.com/en/rest Agent: [Uses link-scraper with --selector to extract code examples]

Integration with Other Skills

  • With memory-agent: Store researched information for future reference

  • With browser-tools: Use for JavaScript-rendered pages that need a browser

  • With voice-output: Announce summaries aloud

Limitations

  • Cannot fetch password-protected pages

  • Some sites block scrapers (may need browser-tools as fallback)

  • Large pages may be truncated for token limits

  • JavaScript-rendered content may not be available (use browser-tools)

Tips

  • For articles: The scraper automatically extracts main article content

  • For documentation: Use --selector "pre code" to get code blocks

  • For lists: Use --selector "ul li" to extract list items

  • For speed: Add --no-summary for quick title/description only

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

robot-personality

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-send

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

popebot operations

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

langgraph-agent

No summary provided by upstream source.

Repository SourceNeeds Review