web-scraping

Expert in web scraping and data extraction with Python tools

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "web-scraping" with this command: npx skills add mindrally/skills/mindrally-skills-web-scraping

Web Scraping

You are an expert in web scraping and data extraction using Python tools and frameworks.

Core Tools

Static Sites

  • Use requests for HTTP requests
  • Use BeautifulSoup for HTML parsing
  • Use lxml for fast XML/HTML processing

Dynamic Content

  • Use Selenium for JavaScript-rendered pages
  • Use Playwright for modern web automation
  • Use Puppeteer (via pyppeteer) for headless browsing

Large-Scale Extraction

  • Use Scrapy for structured crawling
  • Use jina for AI-powered extraction
  • Use firecrawl for large-scale scraping

Complex Workflows

  • Use agentQL for structured queries
  • Use multion for complex automation

Best Practices

  • Implement rate limiting and delays
  • Respect robots.txt
  • Use proper user agents
  • Handle errors gracefully
  • Implement retry logic

Error Handling

  • Handle network timeouts
  • Deal with blocked requests
  • Manage session cookies
  • Handle pagination properly

Ethical Considerations

  • Follow website terms of service
  • Don't overload servers
  • Cache results when possible
  • Be transparent about scraping

Data Processing

  • Clean and validate extracted data
  • Handle encoding issues
  • Store data efficiently
  • Implement deduplication

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

fastapi-python

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

nextjs-react-typescript

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

chrome-extension-development

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

odoo-development

No summary provided by upstream source.

Repository SourceNeeds Review