docx

DOCX Creation, Editing, and Analysis

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "docx" with this command: npx skills add kortix-ai/kortix-registry/kortix-ai-kortix-registry-docx

DOCX Creation, Editing, and Analysis

Overview

A .docx file is a ZIP archive containing XML files.

Quick Reference

Task Approach

Read/analyze content pandoc or unpack for raw XML

Create new document Use docx-js -- see Creating New Documents below

Edit existing document Unpack -> edit XML -> repack -- see Editing Existing Documents below

Visual review Convert DOCX -> PDF -> PNGs via soffice

  • pdftoppm or scripts/render_docx.py

Converting .doc to .docx

python scripts/office/soffice.py --headless --convert-to docx document.doc

Reading Content

pandoc --track-changes=all document.docx -o output.md python scripts/office/unpack.py document.docx unpacked/

Converting to Images

python scripts/office/soffice.py --headless --convert-to pdf document.docx pdftoppm -jpeg -r 150 document.pdf page

Or use the bundled render script:

python scripts/render_docx.py /path/to/file.docx --output_dir /tmp/docx_pages

Accepting Tracked Changes

python scripts/accept_changes.py input.docx output.docx

Creating New Documents

Generate .docx files with JavaScript, then validate. Install: npm install -g docx

Setup

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun, Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink, TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType, VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] }); Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

Validation

After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.

python scripts/office/validate.py doc.docx

Page Size

// CRITICAL: docx-js defaults to A4, not US Letter // Always set page size explicitly sections: [{ properties: { page: { size: { width: 12240, // 8.5 inches in DXA height: 15840 // 11 inches in DXA }, margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } }, children: [/* content */] }]

Paper Width Height Content Width (1" margins)

US Letter 12,240 15,840 9,360

A4 (default) 11,906 16,838 9,026

Landscape: pass portrait dimensions and let docx-js swap:

size: { width: 12240, height: 15840, orientation: PageOrientation.LANDSCAPE }

Styles (Override Built-in Headings)

Use Arial as default font. Keep titles black for readability.

const doc = new Document({ styles: { default: { document: { run: { font: "Arial", size: 24 } } }, paragraphStyles: [ { id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 32, bold: true, font: "Arial" }, paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, { id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true, run: { size: 28, bold: true, font: "Arial" }, paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } }, ] }, sections: [{ children: [ new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }), ]}] });

Lists (NEVER use unicode bullets)

// WRONG new Paragraph({ children: [new TextRun("\u2022 Item")] })

// CORRECT - use numbering config const doc = new Document({ numbering: { config: [ { reference: "bullets", levels: [{ level: 0, format: LevelFormat.BULLET, text: "\u2022", alignment: AlignmentType.LEFT, style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] }, { reference: "numbers", levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT, style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] }, ] }, sections: [{ children: [ new Paragraph({ numbering: { reference: "bullets", level: 0 }, children: [new TextRun("Item")] }), ]}] });

Tables

CRITICAL: Set both columnWidths on the table AND width on each cell.

const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" }; const borders = { top: border, bottom: border, left: border, right: border };

new Table({ width: { size: 9360, type: WidthType.DXA }, columnWidths: [4680, 4680], rows: [ new TableRow({ children: [ new TableCell({ borders, width: { size: 4680, type: WidthType.DXA }, shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID margins: { top: 80, bottom: 80, left: 120, right: 120 }, children: [new Paragraph({ children: [new TextRun("Cell")] })] }) ] }) ] })

Always use WidthType.DXA -- never WidthType.PERCENTAGE (breaks in Google Docs).

Images

new Paragraph({ children: [new ImageRun({ type: "png", // Required: png, jpg, jpeg, gif, bmp, svg data: fs.readFileSync("image.png"), transformation: { width: 200, height: 150 }, altText: { title: "Title", description: "Desc", name: "Name" } })] })

Page Breaks

new Paragraph({ children: [new PageBreak()] }) // Or: new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })

Table of Contents

new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })

Headers/Footers

sections: [{ headers: { default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] }) }, footers: { default: new Footer({ children: [new Paragraph({ children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })] })] }) }, children: [/* content */] }]

Critical Rules for docx-js

  • Set page size explicitly (defaults to A4)

  • Landscape: pass portrait dimensions, set orientation: PageOrientation.LANDSCAPE

  • Never use \n -- use separate Paragraph elements

  • Never use unicode bullets -- use LevelFormat.BULLET

  • PageBreak must be in Paragraph

  • ImageRun requires type

  • Always use WidthType.DXA for tables

  • Tables need dual widths: columnWidths

  • cell width
  • Use ShadingType.CLEAR , never SOLID

  • TOC requires HeadingLevel only

  • Override built-in styles with exact IDs: "Heading1", "Heading2", etc.

  • Include outlineLevel for TOC (0 for H1, 1 for H2)

Editing Existing Documents

Follow all 3 steps in order.

Step 1: Unpack

python scripts/office/unpack.py document.docx unpacked/

Step 2: Edit XML

Edit files in unpacked/word/ . See XML Reference below.

Use the Edit tool directly for string replacement. Do not write Python scripts.

Use smart quotes for new content:

<w:t>Here&#x2019;s a quote: &#x201C;Hello&#x201D;</w:t>

Entity Character

&#x2018;

' (left single)

&#x2019;

' (right single / apostrophe)

&#x201C;

" (left double)

&#x201D;

" (right double)

Adding comments: Use comment.py :

python scripts/comment.py unpacked/ 0 "Comment text with &amp; and &#x2019;" python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 python scripts/comment.py unpacked/ 0 "Text" --author "Kortix Agent"

Then add markers to document.xml (see Comments in XML Reference).

Step 3: Pack

python scripts/office/pack.py unpacked/ output.docx --original document.docx

Common Pitfalls

  • Replace entire <w:r> elements when adding tracked changes

  • Preserve <w:rPr> formatting -- copy original run's formatting into tracked change runs

XML Reference

Schema Compliance

  • Element order in <w:pPr> : <w:pStyle> , <w:numPr> , <w:spacing> , <w:ind> , <w:jc> , <w:rPr> last

  • Whitespace: Add xml:space="preserve" to <w:t> with leading/trailing spaces

  • RSIDs: Must be 8-digit hex (e.g., 00AB1234 )

Tracked Changes

Insertion:

<w:ins w:id="1" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z"> <w:r><w:t>inserted text</w:t></w:r> </w:ins>

Deletion:

<w:del w:id="2" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z"> <w:r><w:delText>deleted text</w:delText></w:r> </w:del>

Inside <w:del> : use <w:delText> instead of <w:t> .

Minimal edits -- only mark what changes:

<w:r><w:t>The term is </w:t></w:r> <w:del w:id="1" w:author="Kortix Agent" w:date="..."> <w:r><w:delText>30</w:delText></w:r> </w:del> <w:ins w:id="2" w:author="Kortix Agent" w:date="..."> <w:r><w:t>60</w:t></w:r> </w:ins> <w:r><w:t> days.</w:t></w:r>

Deleting entire paragraphs -- also mark paragraph mark as deleted:

<w:p> <w:pPr> <w:rPr> <w:del w:id="1" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z"/> </w:rPr> </w:pPr> <w:del w:id="2" w:author="Kortix Agent" w:date="2025-01-01T00:00:00Z"> <w:r><w:delText>Entire paragraph content...</w:delText></w:r> </w:del> </w:p>

Rejecting another author's insertion:

<w:ins w:author="Jane" w:id="5"> <w:del w:author="Kortix Agent" w:id="10"> <w:r><w:delText>their inserted text</w:delText></w:r> </w:del> </w:ins>

Restoring another author's deletion:

<w:del w:author="Jane" w:id="5"> <w:r><w:delText>deleted text</w:delText></w:r> </w:del> <w:ins w:author="Kortix Agent" w:id="10"> <w:r><w:t>deleted text</w:t></w:r> </w:ins>

Comments

Markers are direct children of w:p , never inside w:r :

<w:commentRangeStart w:id="0"/> <w:r><w:t>commented text</w:t></w:r> <w:commentRangeEnd w:id="0"/> <w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>

Images

  • Add image file to word/media/

  • Add relationship to word/_rels/document.xml.rels

  • Add content type to [Content_Types].xml

  • Reference in document.xml with <w:drawing>

Visual Review Workflow

  • Convert DOCX -> PDF -> PNGs

  • Use scripts/render_docx.py (requires pdf2image and Poppler)

  • After each meaningful change, re-render and inspect

  • If visual review is not possible, extract text with python-docx as fallback

Quality Expectations

  • Deliver client-ready documents: consistent typography, spacing, margins, clear hierarchy

  • Avoid formatting defects: clipped/overlapping text, broken tables, unreadable characters

  • Use ASCII hyphens only. Avoid U+2011 and other Unicode dashes

  • Re-render and inspect every page at 100% zoom before final delivery

Dependencies

  • pandoc: Text extraction

  • docx: npm install -g docx (new documents)

  • LibreOffice: PDF conversion (via scripts/office/soffice.py )

  • Poppler: pdftoppm for images

  • python-docx: Reading/analysis (pip install python-docx )

  • pdf2image: Rendering (pip install pdf2image )

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

openalex-paper-search

No summary provided by upstream source.

Repository SourceNeeds Review
Research

domain-research

No summary provided by upstream source.

Repository SourceNeeds Review
Research

paper-creator

No summary provided by upstream source.

Repository SourceNeeds Review
Research

deep-research

No summary provided by upstream source.

Repository SourceNeeds Review