ODT Document Skill

An .odt file is a ZIP package containing XML files and assets. Use high-level libraries when possible; fall back to raw XML for complex formatting, tracked changes, or precise control.

Reference files

Topic	File
Package structure	`references/odt-structure.md`
Tracked changes	`references/odt-change-tracking.md`
Language/locale	`references/odt-language.md`
XML patterns	`references/odf-xml.md`
odfdo library	`references/odfdo.md`
ODTDocument API	`references/odt-document.md`
Schema validation	`references/odf-schemas.md`
Tools overview	`references/tools.md`

Example prompts

"Extract all text from this ODT and summarize headings and tables."
"Apply tracked changes to replace these phrases, keeping redlines intact."
"Set Khmer language/locale and fonts for all paragraph styles."
"Validate this ODT package and report any missing files."
"Unpack, edit content.xml, and repack without breaking styles."

Workflow decision tree

Task	Approach
Read/extract text	Use `pandoc` or `odfdo-markdown`
Create new document	Read `references/odfdo.md`, use odfdo library
Simple edits	Read `references/odfdo.md`, use odfdo library
Tracked changes/annotations	Read `references/odt-document.md`, use ODTDocument API
Complex XML edits	Read `references/odf-xml.md`, edit XML directly

Core operations

Text extraction

pandoc document.odt -o document.md      # Convert to Markdown
odfdo-markdown document.odt > out.md    # Alternative

Unpack/pack workflow

python scripts/unpack_odt.py <file.odt> <dir>   # Unpack
# ... edit XML files ...
python scripts/pack_odt.py <dir> <file.odt>     # Repack

Using ODTDocument library

Scripts using ODTDocument require PYTHONPATH set to the skill root. If you're running from a repo root, point it explicitly to the skill folder:

PYTHONPATH=/path/to/repo/skills/odt python your_script.py
# example from repo root
PYTHONPATH=skills/odt python your_script.py

Creating documents with odfdo

Read references/odfdo.md first. Basic example:

from odfdo import Document, Paragraph
doc = Document('text')
doc.body.append(Paragraph('Hello'))
doc.save('output.odt')

Raw XML editing

Read references/odf-xml.md first. Key files:

content.xml — main document content
styles.xml — document styles
META-INF/manifest.xml — package manifest (update when adding files)

Tracked changes workflow

Read references/odt-change-tracking.md for structure details. Key principles:

Batch changes: Group 3–10 related changes per batch for debugging
Minimal marking: Only mark text that actually changes

Workflow:

Convert to Markdown: pandoc doc.odt -o doc.md
Identify and group changes into batches
Use ODTDocument.suggest_replacement() / suggest_insertion() / suggest_deletion()
Validate: python scripts/validate_changes.py content.xml
Repack and verify in LibreOffice

Language support (Khmer, Thai, Arabic, etc.)

See references/odt-language.md. Helper script:

python scripts/set_language.py styles.xml --lang km --country KH --font "Khmer OS System"

Converting ODT to PDF/images

soffice --headless --convert-to pdf document.odt    # ODT → PDF
pdftoppm -jpeg -r 150 document.pdf page             # PDF → images

Dependencies

Required

odfdo: pip install odfdo (ODT creation/editing)
defusedxml: pip install defusedxml (safe XML parsing)

Optional

pandoc: brew install pandoc (Markdown conversion)
LibreOffice: brew install --cask libreoffice (ODT → PDF)
jing: Relax NG validator for validate_rng.py
Poppler: brew install poppler (pdftoppm for PDF → images)

Validation scripts

python scripts/validate_odt.py <dir> --original <file.odt>     # Package validation
python scripts/validate_changes.py <dir>/content.xml           # Change tracking
python scripts/validate_rng.py <dir>/content.xml               # RNG schema (needs jing)
python scripts/smoke_test.py <file.odt> <work_dir>             # Full roundtrip

odt

Safety Notice

Copy this and send it to your AI assistant to learn