convert-doc

Smart Document Pipeline

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "convert-doc" with this command: npx skills add carlheath/ogmios/carlheath-ogmios-convert-doc

Smart Document Pipeline

Quick Reference

Convert document (auto-caches, auto-summarizes if >100KB)

python ~/.claude/lib/document-converter.py "/path/to/file.pdf"

Force regenerate

python ~/.claude/lib/document-converter.py "/path/to/file.pdf" --force

List cached documents

python ~/.claude/lib/document-converter.py --list

Cleanup old cache (>1 week)

python ~/.claude/lib/document-converter.py --cleanup

Supported Formats

Format Extension Tool Notes

PDF .pdf PyMuPDF Text extraction, page-by-page

Word .docx, .doc pandoc/python-docx Full markdown

PowerPoint .pptx, .ppt python-pptx Slide-by-slide with notes

Excel .xlsx, .xls openpyxl Tables as markdown

RTF .rtf pandoc Rich text

Output Structure

{ "cache_path": "/path/to/cached/file.md", "summary_path": "/path/to/cached/file_summary.md", // if >100KB "from_cache": false, "original_size": 26744198, "converted_size": 129844, "summary_size": 30638, "savings_percent": 99.5, "recommendation": "summary" // "summary" or "full" }

Auto-Summary

Documents >100KB automatically get a summary version:

Version Purpose Size Target

Full Complete content As converted

Summary Quick overview ~30KB

The summary preserves:

  • All headers and structure

  • First portion of each section

  • Metadata and source reference

Automatic Integration

The smart-read-interceptor hook automatically triggers when you read:

  • PDF, Word, PowerPoint, Excel files

  • Any file >200KB

It will suggest:

  • Use summary - If summary exists (best for overview)

  • Use cache - If full cached version exists

  • Convert first - If no cache exists

  • Delegate - For very large files, use subagent

Subagent Delegation Pattern

For very large documents, delegate to isolated context:

Task( subagent_type="Explore", prompt="Read and summarize key points from: /path/to/large-file.pdf. Focus on: [specific topics]. Max 500 words summary." )

This keeps the large content OUT of main context.

Cache Location

~/.claude/cache/documents/ ├── filename_hash.md # Full converted version ├── filename_hash_summary.md # Summary (if >100KB) └── ...

Cache expires after 1 week. Run --cleanup to remove old files.

Real-World Results

Document Original Converted Summary Savings

Google AI Guide (PDF) 26.7 MB 127 KB 30 KB 99.9%

Debatt (Word) 206 KB 5.4 KB

97%

Övning (PowerPoint) 7.2 MB 3.1 KB

99.96%

Workflow Examples

Reading a PDF for research

  1. User asks to analyze a PDF
  2. Hook detects: "📄 DOCUMENT FILE: .PDF"
  3. Convert: python ~/.claude/lib/document-converter.py "file.pdf"
  4. Read the summary for overview
  5. Read specific sections from full version if needed

Processing multiple documents

  1. Convert all documents first (batch): for f in *.pdf; do python ~/.claude/lib/document-converter.py "$f"; done

  2. Read summaries in main context

  3. Delegate deep analysis to subagents

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

geolocation-skill

No summary provided by upstream source.

Repository SourceNeeds Review
General

architecture

No summary provided by upstream source.

Repository SourceNeeds Review
General

pm-policy

No summary provided by upstream source.

Repository SourceNeeds Review
General

document-factory

No summary provided by upstream source.

Repository SourceNeeds Review