pymupdf-pdf

Fast local PDF parsing with PyMuPDF (fitz) for Markdown/JSON outputs and optional images/tables. Use when speed matters more than robustness, or as a fallback while heavier parsers are unavailable. Default to single-PDF parsing with per-document output folders.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pymupdf-pdf" with this command: npx skills add kesslerio/pymupdf-pdf-parser-clawdbot-skill/kesslerio-pymupdf-pdf-parser-clawdbot-skill-pymupdf-pdf

PyMuPDF PDF

Overview

Parse PDFs locally using PyMuPDF for fast, lightweight extraction into Markdown by default, with optional JSON and image/table outputs in a per-document directory.

Prereqs / when to read references

If you hit import errors (PyMuPDF not installed) or Nix libstdc++ issues, read:

  • references/pymupdf-notes.md

Quick start (single PDF)

# Run from the skill directory
./scripts/pymupdf_parse.py /path/to/file.pdf \
  --format md \
  --outroot ./pymupdf-output

Options

  • --format md|json|both (default: md)
  • --images to extract images
  • --tables to extract a simple line-based table JSON (quick/rough)
  • --outroot DIR to change output root
  • --lang adds a language hint into JSON output metadata

Output conventions

  • Create ./pymupdf-output/<pdf-basename>/ by default.
  • Markdown output: output.md
  • JSON output: output.json (includes lang)
  • Images: images/ subdir
  • Tables: tables.json (rough line-based)

Notes

  • PyMuPDF is fast but less robust on complex PDFs.
  • For more robust parsing, use a heavy-duty OCR parser (e.g., MinerU) if installed.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

coding-router

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

mineru-pdf

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

coding-agent

No summary provided by upstream source.

Repository SourceNeeds Review
Research

academic-deep-research

No summary provided by upstream source.

Repository SourceNeeds Review