markdown

Convert any document format TO Markdown. Supports 14 formats (PDF, DOCX, XLSX, PPTX, HTML, CSV, EPUB, MSG, and more) via unified CLI. Use when Claude needs to read or extract text from non-Markdown files.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "markdown" with this command: npx skills add sarukas/claude-skill-markdown/sarukas-claude-skill-markdown-markdown

Markdown - Document-to-Markdown Conversion

Convert documents to Markdown for reading, analysis, and processing.

Decision Tree

User Request
|
+-- Convert file to Markdown
|   +-- Single file --> scripts/convert_to_md.py input.pdf
|   +-- With explicit output --> scripts/convert_to_md.py input.pdf output.md
|   +-- Batch directory --> scripts/convert_to_md.py -d ./folder/ -r [-t pdf docx]
|   +-- Check available formats --> scripts/convert_to_md.py --list-formats
|   +-- Check dependencies --> scripts/convert_to_md.py --check-deps [format]
|
+-- Read/analyze document content
|   +-- Convert first, then analyze the Markdown output
|
+-- XLSX with specific sheets
|   +-- scripts/convert_to_md.py data.xlsx --sheets Sheet1 Sheet2

Single File Conversion

python scripts/convert_to_md.py report.pdf
python scripts/convert_to_md.py report.pdf output.md
python scripts/convert_to_md.py data.xlsx --sheets Sheet1

Output defaults to same name with .md extension in the same directory.

Batch Conversion

python scripts/convert_to_md.py -d ./contracts/ -r              # All supported types, recursive
python scripts/convert_to_md.py -d ./contracts/ -t pdf docx      # Only PDF and DOCX
python scripts/convert_to_md.py -d ./contracts/ -o ./output/      # Custom output directory
python scripts/convert_to_md.py -d ./contracts/ --no-skip         # Re-convert even if .md exists

Info Commands

python scripts/convert_to_md.py --list-formats     # Show all formats + dependency status
python scripts/convert_to_md.py --check-deps        # Check all dependencies
python scripts/convert_to_md.py --check-deps pdf    # Check PDF deps only

Supported Formats

FormatExtensionsLibraryNotes
PDF.pdfpymupdf4llm + pdfplumberBest table extraction, dual-engine
XLSX.xlsxopenpyxlSheet selection, formula preservation
XLS.xlsmarkitdownLegacy Excel
DOCX.docxmarkitdownWord documents
PPTX.pptxmarkitdownPowerPoint slides
HTML.html, .htmhtml2text + BeautifulSoupTable preservation
CSV/TSV.csv, .tsvstdlib csvAuto-detect delimiter
EPUB.epubmarkitdownE-books
MSG.msgmarkitdownOutlook messages
IPYNB.ipynbmarkitdownJupyter notebooks
JSON.jsonmarkitdownStructured data
XML.xmlmarkitdownStructured markup
ZIP.zipmarkitdownArchive contents
Images.jpg, .jpeg, .png, .gif, .bmp, .tiff, .webpmarkitdownOCR/description
Audio.mp3, .wavmarkitdownTranscription

14 formats, 27 extensions total.

Format-Specific Options

PDF

  • Dual-engine: pymupdf4llm (primary) with pdfplumber fallback for tables
  • Large files chunked automatically

XLSX

  • --sheets Sheet1 Sheet2: Convert only specific sheets
  • Preserves table structure with headers

HTML

  • Strips scripts/styles, preserves tables and links
  • Handles both local files and saved web pages

CSV/TSV

  • Auto-detects delimiter (comma, tab, semicolon, pipe)
  • Outputs as Markdown table

Dependencies

Each format has its own requirements file in scripts/converters/:

# Install all dependencies
pip install -r scripts/converters/requirements-all.txt

# Or install per-format
pip install -r scripts/converters/requirements-pdf.txt
pip install -r scripts/converters/requirements-xlsx.txt
pip install -r scripts/converters/requirements-html.txt
pip install -r scripts/converters/requirements-csv.txt
pip install -r scripts/converters/requirements-markitdown.txt   # DOCX, XLS, PPTX, EPUB, MSG, etc.

Core dependencies:

  • PDF: pymupdf pymupdf4llm pdfplumber
  • XLSX: openpyxl
  • HTML: beautifulsoup4 html2text
  • CSV: stdlib (no install needed)
  • Markitdown formats: markitdown

Troubleshooting

"Unsupported file extension"

  • Run --list-formats to see supported extensions
  • Check file has correct extension

"Missing dependencies"

  • Run --check-deps [format] to see what's needed
  • Install with pip as shown above

Large PDF produces poor output

  • The converter uses dual-engine approach; pdfplumber handles complex tables better
  • For scanned PDFs, OCR support depends on markitdown

XLSX tables look wrong

  • Try specifying --sheets to convert individual sheets
  • Very wide tables may wrap in Markdown

Verbose logging

python scripts/convert_to_md.py -v report.pdf    # Debug-level logging
python scripts/convert_to_md.py -q report.pdf    # Suppress informational output

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

markdown

No summary provided by upstream source.

Repository SourceNeeds Review
General

markdown

No summary provided by upstream source.

Repository SourceNeeds Review
General

markdown

No summary provided by upstream source.

Repository SourceNeeds Review