MarkItDown Skill
Documentation and utilities for converting documents to Markdown using Microsoft's MarkItDown library.
Note: This skill provides documentation and a batch script. The actual conversion is done by the
markitdownCLI/library installed via pip.
When to Use
Use markitdown for:
- 📄 Fetching documentation (README, API docs)
- 🌐 Converting web pages to markdown
- 📝 Document analysis (PDFs, Word, PowerPoint)
- 🎬 YouTube transcripts
- 🖼️ Image text extraction (OCR)
- 🎤 Audio transcription
Quick Start
# Convert file to markdown
markitdown document.pdf -o output.md
# Convert URL
markitdown https://example.com/docs -o docs.md
Supported Formats
| Format | Features |
|---|---|
| Text extraction, structure | |
| Word (.docx) | Headings, lists, tables |
| PowerPoint | Slides, text |
| Excel | Tables, sheets |
| Images | OCR + EXIF metadata |
| Audio | Speech transcription |
| HTML | Structure preservation |
| YouTube | Video transcription |
Installation
The skill requires Microsoft's markitdown CLI:
pip install 'markitdown[all]'
Or install specific formats only:
pip install 'markitdown[pdf,docx,pptx]'
Common Patterns
Fetch Documentation
markitdown https://github.com/user/repo/blob/main/README.md -o readme.md
Convert PDF
markitdown document.pdf -o document.md
Batch Convert
# Using included script
python ~/.openclaw/skills/markitdown/scripts/batch_convert.py docs/*.pdf -o markdown/ -v
# Or shell loop
for file in docs/*.pdf; do
markitdown "$file" -o "${file%.pdf}.md"
done
Python API
from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("document.pdf")
print(result.text_content)
Troubleshooting
"markitdown not found"
pip install 'markitdown[all]'
OCR Not Working
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
What This Skill Provides
| Component | Source |
|---|---|
markitdown CLI | Microsoft's pip package |
markitdown Python API | Microsoft's pip package |
scripts/batch_convert.py | This skill (utility) |
| Documentation | This skill |
See Also
- USAGE-GUIDE.md - Detailed examples
- reference.md - Full API reference
- Microsoft MarkItDown - Upstream library