doc-scraper

Documentation extraction and indexing. Extracts information from markdown files and syncs to workspace-db. Works alongside workspace-db which handles synchronization and organization.

Safety Notice

This listing is from the official public ClawHub registry. Review SKILL.md and referenced scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "doc-scraper" with this command: npx skills add kikikari/doc-scraper

Doc Scraper

Dokumentations-Extraktion und Indexierung - arbeitet mit workspace-db zusammen.

Zusammenspiel mit workspace-db

SkillAufgabeDatenbank
workspace-dbSynchronisation & Organisationdocs.db
doc-scraper (dieser)InformationsextraktionNutzt docs.db

Aufgaben

1. Markdown-Extraktion

// Extrahiert aus SKILL.md:
// - Name, Version, Beschreibung
// - Nutzungsbeispiele
// - Konfigurationsoptionen

const docInfo = await docScraper.extractMarkdown({
  file: "skills/my-skill/SKILL.md",
  extract: ["title", "description", "usage", "config"]
});

2. Indexierung in docs.db

// Speichert extrahierte Daten in docs.db
// (workspace-db verwaltet die DB)

await docScraper.index({
  source: "skills/my-skill/SKILL.md",
  data: docInfo,
  tags: ["skill", "api"]
});

3. Auto-Update bei Änderungen

# Überwacht .md Dateien
# Extrahiert bei Änderung neu
# Aktualisiert docs.db

doc-scraper watch --dir skills/ --ext .md

Extraktions-Templates

Skill-Dokumentation

# Aus SKILL.md extrahiert:
name: "skill-name"
description: "Beschreibung"
version: "1.0.0"
category: "database"
usage_examples:
  - command: "openclaw skill"
    result: "..."

API-Dokumentation

# Aus API.md extrahiert:
endpoints:
  - path: "/api/v1/search"
    method: "GET"
    params:
      - query: string
    response: json

System-Dokumentation

# Aus SYSTEM.md extrahiert:
components:
  - databases:
      - docs.db
      - tree.db
cron_jobs:
  - db-maintainer: "*/30"

Workflow

skill.md geändert
    ↓
doc-scraper erkennt Änderung
    ↓
Extrahiert: name, desc, usage, config
    ↓
Speichert in docs.db
    ↓
workspace-db synchronisiert

Nutzung

Einmalig

doc-scraper index --dir skills/ --recursive
doc-scraper index --dir docs/ --ext .md

Watch-Modus

# Kontinuierlich überwachen
doc-scraper watch --dir workspace/

# Einzelne Datei
doc-scraper watch --file README.md

Suche

# Direkt in extrahierten Daten suchen
doc-scraper search --query "database"
doc-scraper search --tag "api" --format json

Integration mit workspace-db

// doc-scraper extrahiert
// workspace-db speichert/organisiert

const extracted = await docScraper.extract('skills/my/SKILL.md');

// Übergabe an workspace-db
await workspaceDb.syncDocument({
  id: extracted.name,
  category: extracted.category,
  data: extracted,
  source_file: 'skills/my/SKILL.md'
});

Konfiguration

{
  "doc-scraper": {
    "watch_dirs": ["skills/", "docs/"],
    "extensions": [".md", ".mdx"],
    "extract_headers": ["##", "###"],
    "auto_index": true,
    "workspace_db_integration": true
  }
}

Links

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Security

anydocs - Generic Documentation Indexing & Search

Generic Documentation Indexing & Search. Index any documentation site (SPA/static) and search it instantly.

Registry SourceRecently Updated
1.5K0Profile unavailable
Research

PBE Extractor

Extract invariant principles from any text — find the ideas that survive rephrasing.

Registry SourceRecently Updated
1.8K6Profile unavailable
General

claw-text-and-pics

Extract text and embedded images from scanned documents, PDFs, and photos via Mistral OCR API. Use when reading receipts, invoices, contracts, handwritten no...

Registry SourceRecently Updated
1000Profile unavailable
General

Workspace Database Manager

Workspace Documentation and Tree Database Manager. SQLite-based indexing for all documentation, files, and directory structures with CSV/JSON export capabili...

Registry SourceRecently Updated
860Profile unavailable