Pandoc Document Converter
Convert between Markdown, Word (.docx), HTML, and PDF with proper CJK support out of the box.
Supported Conversions
| From | To |
|---|---|
| Markdown | |
| Markdown | Word (.docx) |
| Markdown | HTML |
| Word | Markdown |
| Word | |
| HTML | Markdown |
| HTML | Word (.docx) |
| HTML |
PDF as input is not supported (pandoc limitation).
Scripts
| Script | Purpose | When to use |
|---|---|---|
convert-to-pdf.sh | Optimized PDF with CJK monospace font, 11pt, 1.5cm margins | All PDF conversions (recommended) |
fix-ascii-art.py | Pad ASCII box lines to equal width | Before Word conversion if ASCII diagrams exist |
Step-by-step Workflow
1. Identify the conversion
From the user's request, determine:
- Source file(s): path and format
- Target format: pdf, docx, md, or html
- Options: template, styling, batch mode
Verify the source file exists before proceeding.
2. Build the pandoc command
Start with the base: pandoc <input> -o <output>
Then layer on options based on the target format.
PDF Output
Recommended: use the conversion script (includes CJK monospace font, 11pt, optimized margins):
bash ~/.agents/skills/pandoc-converter/scripts/convert-to-pdf.sh input.md
Manual setup:
pandoc input.md -o output.pdf \
--pdf-engine=xelatex \
-V CJKmainfont="PingFang SC" \
-V monofont="Sarasa Fixed SC" \
-V geometry:margin=2cm
Common variables:
-V fontsize=11pt # 11pt recommended for technical docs
-V linestretch=1.5
-V papersize=a4
-V toc=true
📚 Font details: references/fonts.md
Word Output
Recommended workflow:
# 1. Fix ASCII art alignment (if needed)
python3 ~/.agents/skills/pandoc-converter/scripts/fix-ascii-art.py input.md --check
# 2. Fix if issues found
python3 ~/.agents/skills/pandoc-converter/scripts/fix-ascii-art.py input.md
# 3. Convert with reference.docx
pandoc input.md -o output.docx \
--reference-doc=~/.agents/skills/pandoc-converter/references/reference.docx
Built-in reference.docx includes:
- CJK font: 思源黑体 CN (Source Han Sans CN)
- English font: Times New Roman
- Code font: Sarasa Fixed SC (CJK-aware monospace)
- Table styles: Header shading, vertical center alignment
Markdown Output
pandoc input.docx -o output.md --extract-media=./media --wrap=none
HTML Output
# Standalone HTML
pandoc input.md -o output.html --standalone
# With CSS
pandoc input.md -o output.html --standalone --css=style.css
# Self-contained (embed images)
pandoc input.md -o output.html --standalone --embed-resources
HTML Input
# HTML to Markdown
pandoc input.html -o output.md --wrap=none
# HTML to PDF
pandoc input.html -o output.pdf --pdf-engine=xelatex -V CJKmainfont="PingFang SC"
3. Handle images and resources
- For markdown with local images: use
--resource-pathif needed - For Word to Markdown: always use
--extract-media - For PDF with images: xelatex handles most formats
4. Run and verify
Execute the command. Common issues:
| Issue | Solution |
|---|---|
| xelatex not found | brew install --cask mactex |
| Font not found | fc-list :lang=zh to list available fonts |
| Missing LaTeX package | tlmgr install <package> |
5. Batch conversion
Use a for-loop with the same options as single-file conversion:
for f in *.md; do pandoc "$f" -o "${f%.md}.pdf" --pdf-engine=xelatex -V CJKmainfont="PingFang SC"; done
Advanced Features
The following features are documented in separate reference files:
| Feature | Description | Reference |
|---|---|---|
| Font Configuration | CJK fonts, fallback, code fonts | references/fonts.md |
| Syntax Highlighting | Code themes, language support | references/syntax-highlighting.md |
| Math | LaTeX equations, MathJax, KaTeX | references/math.md |
| PDF Features | Metadata, frontmatter, watermarks | references/pdf-features.md |
| Advanced | Citations, multi-file, GFM, Lua filters | references/advanced.md |
Common Pitfalls
- Garbled Chinese text in PDF: Always use
--pdf-engine=xelatexwith a CJK font - Word styles look wrong: Use
--reference-docfor custom styling - Images missing in Markdown output: Add
--extract-media - PDF margins too tight: Add
-V geometry:margin=2cm - HTML lacks styles: Use
--standalone - HTML images not showing: Use
--embed-resourcesto inline images - Citations not rendering: Ensure
--citeprocis included - Math not rendering in HTML: Add
--mathjaxor--katex - ASCII art misaligned in Word/PDF:
- Run
python3 ~/.agents/skills/pandoc-converter/scripts/fix-ascii-art.py input.md - Use
convert-to-pdf.shwhich enforces monospace font
- Run
- Code block background shows trailing spaces: reference.docx has no shading on Source Code style
Output Naming
Unless the user specifies an output path, place the output in the same directory as the input, with the same base name and the new extension.
Safety
- Check if target file exists before overwriting
- Always quote paths in shell commands
- Only read source files and write output; never modify originals