ODT Document Skill
An .odt file is a ZIP package containing XML files and assets. Use high-level libraries when possible; fall back to raw XML for complex formatting, tracked changes, or precise control.
Reference files
| Topic | File |
|---|---|
| Package structure | references/odt-structure.md |
| Tracked changes | references/odt-change-tracking.md |
| Language/locale | references/odt-language.md |
| XML patterns | references/odf-xml.md |
| odfdo library | references/odfdo.md |
| ODTDocument API | references/odt-document.md |
| Schema validation | references/odf-schemas.md |
| Tools overview | references/tools.md |
Example prompts
- "Extract all text from this ODT and summarize headings and tables."
- "Apply tracked changes to replace these phrases, keeping redlines intact."
- "Set Khmer language/locale and fonts for all paragraph styles."
- "Validate this ODT package and report any missing files."
- "Unpack, edit content.xml, and repack without breaking styles."
Workflow decision tree
| Task | Approach |
|---|---|
| Read/extract text | Use pandoc or odfdo-markdown |
| Create new document | Read references/odfdo.md, use odfdo library |
| Simple edits | Read references/odfdo.md, use odfdo library |
| Tracked changes/annotations | Read references/odt-document.md, use ODTDocument API |
| Complex XML edits | Read references/odf-xml.md, edit XML directly |
Core operations
Text extraction
pandoc document.odt -o document.md # Convert to Markdown
odfdo-markdown document.odt > out.md # Alternative
Unpack/pack workflow
python scripts/unpack_odt.py <file.odt> <dir> # Unpack
# ... edit XML files ...
python scripts/pack_odt.py <dir> <file.odt> # Repack
Using ODTDocument library
Scripts using ODTDocument require PYTHONPATH set to the skill root. If you're running from a repo root, point it explicitly to the skill folder:
PYTHONPATH=/path/to/repo/skills/odt python your_script.py
# example from repo root
PYTHONPATH=skills/odt python your_script.py
Creating documents with odfdo
Read references/odfdo.md first. Basic example:
from odfdo import Document, Paragraph
doc = Document('text')
doc.body.append(Paragraph('Hello'))
doc.save('output.odt')
Raw XML editing
Read references/odf-xml.md first. Key files:
content.xml— main document contentstyles.xml— document stylesMETA-INF/manifest.xml— package manifest (update when adding files)
Tracked changes workflow
Read references/odt-change-tracking.md for structure details. Key principles:
- Batch changes: Group 3–10 related changes per batch for debugging
- Minimal marking: Only mark text that actually changes
Workflow:
- Convert to Markdown:
pandoc doc.odt -o doc.md - Identify and group changes into batches
- Use
ODTDocument.suggest_replacement()/suggest_insertion()/suggest_deletion() - Validate:
python scripts/validate_changes.py content.xml - Repack and verify in LibreOffice
Language support (Khmer, Thai, Arabic, etc.)
See references/odt-language.md. Helper script:
python scripts/set_language.py styles.xml --lang km --country KH --font "Khmer OS System"
Converting ODT to PDF/images
soffice --headless --convert-to pdf document.odt # ODT → PDF
pdftoppm -jpeg -r 150 document.pdf page # PDF → images
Dependencies
Required
- odfdo:
pip install odfdo(ODT creation/editing) - defusedxml:
pip install defusedxml(safe XML parsing)
Optional
- pandoc:
brew install pandoc(Markdown conversion) - LibreOffice:
brew install --cask libreoffice(ODT → PDF) - jing: Relax NG validator for
validate_rng.py - Poppler:
brew install poppler(pdftoppm for PDF → images)
Validation scripts
python scripts/validate_odt.py <dir> --original <file.odt> # Package validation
python scripts/validate_changes.py <dir>/content.xml # Change tracking
python scripts/validate_rng.py <dir>/content.xml # RNG schema (needs jing)
python scripts/smoke_test.py <file.odt> <work_dir> # Full roundtrip