Word Documents

.docx create, edit, read, and visual verification

Default Active: Yes — The docx skill is available by default and triggers automatically when you mention Word documents, .docx files, reports, memos, letters, or templates. No manual loading required.

Overview

The docx skill handles the full lifecycle of Word documents: creation from scratch, editing existing files, reading and analysis, template-based generation, tracked-change management, and mandatory visual QA verification. It is scoped exclusively to .docx files — not PDFs, spreadsheets, HWPX, or Google Docs.

The primary tool is officecli, a CLI on PATH that covers ~80% of tasks (add, set, remove, validate, query, track-change accept/reject). For tasks officecli cannot handle — tracked-change creation, OMML equations, bulk pattern matching — the skill falls back to Python OOXML scripts in scripts/.

Triggers

PatternExamples
File extension.docx, Word document
Document typesReports, memos, letters, proposals, contracts
Keywords"Word doc", "template", "tracked changes", "table of contents"
Korean triggers"워드 문서", "보고서 만들어줘", "문서 작성해줘"

Quick Reference

TaskToolCommand PatternNotes
Format like existing docshell + officeclicp source.docx target.docx && officecli open target.docxInherit styles/headers/footers
Create blank DOCXofficecliofficecli create report.docxStart from real Office file
Add paragraphofficecliofficecli add FILE /body --type paragraph --prop text="..."Primary write path
Edit paragraph/runofficecliofficecli set FILE /body/p[N] --prop ...Exact path targeting
Read text/outline/statsofficecliofficecli view FILE textModes: text, annotated, outline, stats, issues, html
Query documentofficecliofficecli query FILE "p[style=Heading1]"CSS-like selectors
Template replacementofficecliofficecli set FILE / --prop find="{{X}}" --prop replace="Y"Preserves template structure
Validation / issue scanofficecliofficecli validate FILEPair with view FILE issues
Accept/reject tracked changesofficecliofficecli set FILE / --prop accept-changes=allAlso reject-changes=all
Create tracked changesPython (L3/L4)scripts/docx_cli.py + ooxml/redline_diff.pyofficecli cannot create — only accept/reject
OMML equationsPython (L4)Unpack → inject <m:oMath> → repackofficecli cannot generate OMML
Complex anchored commentsPython (L3)python3 scripts/comment.py IN OUT --text "..." --anchor "..."For comments beyond officecli
PDF conversion / visual QAsofficesoffice --headless --convert-to pdf FILEScreenshot-based QA

"~해줘" Usage Examples

"분기 보고서 만들어줘"
Creates a quarterly report document from scratch with proper heading hierarchy, table of contents, tables, and professional formatting. Uses officecli create → add structure → validate → PDF verification.
"이 문서 형식에 맞춰서 새로 작성해줘"
Reference-based editing: copies the source file to inherit all styles, headers, and footers, then replaces body content while preserving the document's visual identity. See the Reference-Based Editing workflow below.
"이 워드 문서 내용 요약해줘"
Reads and analyzes the document using officecli view FILE text and view FILE outline, then provides a structured summary of headings, key content, and document statistics.
"변경사항 추적해서 수정해줘"
Creates tracked changes (redlines) via the Python OOXML scripts, since officecli can only accept/reject — not create — tracked changes. Escalates to L3/L4 automatically.
"문서 템플릿에 데이터 채워줘"
Template-safe replacement using officecli set FILE / --prop find="{{X}}" --prop replace="Y" to fill placeholders while preserving the template's formatting and structure.

Reference-Based Editing

When a user says "format like X.docx", "match existing style", "based on template", or provides a source file — always start from the source file. Never rebuild from scratch.

Workflow

  1. Copy the source: cp source.docx target.docx — inherits all styles, margins, numbering, headers, footers
  2. Open: officecli open target.docx — daemon starts; command returns immediately
  3. Remove body content only — keep /styles, /numbering, /header, /footer
  4. Add new paragraphs using existing style names (e.g. --prop style=Heading1) — they auto-apply

Why This Matters

Pandoc-generated and Word-generated documents have specific style IDs (e.g. Heading1, BodyText, FirstParagraph) unique to that document. Adding a new style with the same name causes:

Template Priority

  1. User-provided source file — first-class template
  2. tests/fixtures/*.docx — pre-built working examples shipped with the skill
  3. officecli create blank — only when nothing else applies
# CORRECT: inherit Pandoc styles
cp Assignment1.docx Assignment2.docx
officecli open Assignment2.docx
officecli remove Assignment2.docx "/body/p[1]"   # remove old body (keep styles/headers/footers)
# ... remove more paragraphs ...
officecli add Assignment2.docx /body --type paragraph --prop text="New title" --prop style=Heading1
officecli close Assignment2.docx

# WRONG: recreate styles that already exist -- validate fails
officecli add doc.docx /styles --type style --prop name=Heading1 --prop size=20pt ...

Escalation Ladder

When officecli cannot do the job, escalate in this order:

LevelWhenTool
L1 officecli high-levelTypical add/set/remove operationsofficecli add/set/remove/query/view
L2 officecli raw-setXML injection — PAGE field, fldChar, hyperlink anchor, custom attributesofficecli raw-set FILE PATH --xpath X --action A --xml ...
L3 Python scriptBulk tracked-change ops, comment add, merge runs, redline validationpython3 scripts/*.py
L4 Unpack → edit XML → repackOMML equations, custom style injection, pattern-match editingscripts/docx_cli.py open FILE work/ → edit XML → scripts/docx_cli.py save work/ OUT.docx

Escalation Signals

Core Workflows

Execution Model

Run commands one at a time. Do not write all commands into a shell script and execute as a single block. OfficeCLI is incremental: every add, set, and remove immediately modifies the file and returns output.

  1. One command at a time, then read the output. Check the exit code before proceeding.
  2. Non-zero exit = stop and fix immediately. Do not continue building on a broken state.
  3. Verify after structural operations. After adding a style, table, chart, or section, run get or validate before building on top of it.

Reading and Analyzing

officecli view doc.docx text                    # Full text extraction
officecli view doc.docx text --max-lines 200    # Truncated extraction
officecli view doc.docx text --start 1 --end 50 # Range extraction
officecli view doc.docx outline                 # Structure: stats, headings, headers/footers
officecli view doc.docx annotated               # Style/font/size per run, equations as LaTeX
officecli view doc.docx stats                   # Paragraph count, style/font distribution

Element Inspection

officecli get doc.docx /                        # Document root (metadata, page setup)
officecli get doc.docx /body --depth 1           # List body children
officecli get doc.docx "/body/p[1]"              # Specific paragraph
officecli get doc.docx "/body/p[1]/r[1]"         # Specific run
officecli get doc.docx "/body/tbl[1]" --depth 3  # Table structure
officecli get doc.docx /styles                   # Style definitions
officecli get doc.docx "/styles/Heading1"        # Specific style
officecli get doc.docx "/header[1]"              # Header/footer
officecli get doc.docx /numbering                # Numbering definitions
officecli get doc.docx "/body/p[1]" --json       # JSON output for scripting

CSS-like Queries

officecli query doc.docx 'paragraph[style=Heading1]'            # By style
officecli query doc.docx 'p:contains("quarterly")'              # By text content
officecli query doc.docx 'p:empty'                              # Empty paragraphs
officecli query doc.docx 'image:no-alt'                         # Images without alt text
officecli query doc.docx 'p[align=center] > r[bold=true]'      # Compound selectors
officecli query doc.docx 'paragraph[size>=24pt]'                # By size
officecli query doc.docx 'field[fieldType!=page]'               # Fields by type

Headers and Footers

Known CLI bug: --prop field=page is silently ignored in add --type footer commands. The footer is created with static text only. You must use raw-set to inject the PAGE field after creating the footer.
# Step 1. Empty footer for cover page (auto-enables differentFirstPage)
officecli add doc.docx / --type footer --prop type=first --prop text=""

# Step 2. Default footer with static "Page " text
officecli add doc.docx / --type footer --prop text="Page " \
  --prop type=default --prop align=center --prop size=9pt --prop font=Calibri

# Step 3. Inject PAGE field via raw-set
# (footer[2] = default when first-page footer also exists)
officecli raw-set doc.docx "/footer[2]" \
  --xpath "//w:p" \
  --action append \
  --xml '<w:r ...><w:fldChar w:fldCharType="begin"/></w:r>
         <w:r ...><w:instrText xml:space="preserve"> PAGE </w:instrText></w:r>
         <w:r ...><w:fldChar w:fldCharType="end"/></w:r>'

Footer index rule: When both a first-page footer and a default footer are added, the default footer is /footer[2]. If there is no first-page footer, the default footer is /footer[1]. Always verify with officecli get doc.docx "/footer[2]".

Resident Mode (Performance)

Always use open/close — it is the smart default. Every command benefits from no repeated file I/O.

officecli open doc.docx           # Load once into memory (returns IMMEDIATELY)
officecli add doc.docx ...        # All commands run in memory -- fast
officecli set doc.docx ...
officecli close doc.docx          # Write once to disk
Warning: Do NOT run officecli open as a background shell job (e.g. via run_in_background). It returns immediately and the daemon lives in the background automatically. Running it as a monitored shell creates zombies and file locks.

Batch Mode

Execute multiple operations in a single open/save cycle:

cat <<'EOF' | officecli batch doc.docx
[
  {"command":"add","parent":"/body","type":"paragraph",
   "props":{"text":"Introduction","style":"Heading1"}},
  {"command":"add","parent":"/body","type":"paragraph",
   "props":{"text":"This report covers Q4 results.","font":"Calibri","size":"11pt"}}
]
EOF

Batch supports: add, set, get, query, remove, move, swap, view, raw, raw-set, validate.

Subskill References

Additional detail lives in companion files loaded on demand:

SubskillWhen to Use
officecli-academic-paperAcademic papers, citations, bibliography, TOC for papers
creating.mdDetailed creation recipes (new documents from scratch)
editing.mdDetailed editing guides (modify existing documents)

Decision Flow

Is the document an academic paper (thesis, journal, conference)?
  YES --> read officecli-academic-paper/SKILL.md
  NO  --> continue with main skill

User provided a source file to match?
  YES --> Reference-Based Editing + editing.md
  NO  --> creating.md

Python OOXML Scripts

When officecli reaches its limits, the skill falls back to these Python scripts:

ScriptPurposeCommand
scripts/docx_cli.pyUnified Python CLI — unpack, save, validate, repair, search, TOC, chunk, comment, accept-changes, merge-runspython3 scripts/docx_cli.py {open|save|validate|repair|text|search|...}
scripts/accept_changes.pyAccept all tracked changespython3 scripts/accept_changes.py IN.docx OUT.docx
scripts/comment.pyAdd W3C-compliant OOXML comments anchored to textpython3 scripts/comment.py IN.docx OUT.docx --text "..." --anchor "..."
scripts/ooxml/merge_runs.pyMerge adjacent runs with identical formattingpython3 scripts/ooxml/merge_runs.py unpacked/
scripts/ooxml/redline_diff.pyValidate tracked-change correctness vs originalpython3 scripts/ooxml/redline_diff.py unpacked/ original.docx
scripts/ooxml/simplify_tracked.pySimplify same-author adjacent tracked changespython3 scripts/ooxml/simplify_tracked.py unpacked/

Reference Materials

FileRead WhenContains
references/cjk-handling.mdKorean text / East Asian font / wrapping issuesrFonts East Asian fonts, lang tags, accessibility
references/tracked-changes.mdTrack changes / comments / redline workw:ins, w:del, comments XML, script usage examples

Design Principles for Business Documents

Heading Hierarchy

# Verify heading hierarchy
officecli view report.docx outline

Color Palette

PurposeAllowed ColorsHex Examples
Headings, emphasisNavy#003366, #1B2A4A
Body accents, bordersCharcoal#333333, #4A4A4A
Highlights, calloutsForest green#2E5E3F, #1A4731

Never use rainbow colors, bright primary colors, or more than 3 accent colors in a single document.

Font Selection

ScriptPrimary FontFallback
KoreanMalgun GothicPretendard
English / LatinCalibriAptos

Typography

Body text: 11-12pt. Headings step up: H1 = 18pt minimum (20pt preferred for long documents), H2 = 14pt bold, H3 = 12pt bold. Line spacing of 1.15x-1.5x for body text.

Content-to-Element Mapping

Content TypeRecommended ElementWhy
Sequential itemsBulleted list (listStyle=bullet)Scanning is faster than inline commas
Step-by-step processNumbered list (listStyle=numbered)Numbers communicate order
Comparative dataTable with header rowColumns enable side-by-side comparison
Trend dataEmbedded chart (chartType=line/column)Visual pattern recognition
Key definitionHanging indent paragraphOffset term from definition
Legal/contract clauseNumbered list with bookmarksCross-referencing via bookmarks
Mathematical contentEquation element (formula=LaTeX)Proper OMML rendering
Citation/referenceFootnote or endnoteKeeps body text clean
Pull quote / calloutParagraph with border + shadingVisual distinction from body
Multi-section layoutSection breaks with columnsColumn control per section

Mandatory Verification

Never skip verification. After ANY DOCX creation or edit, both steps below are mandatory.
# Step 1: Structural validation
officecli validate output.docx

# Step 2: Visual PDF verification
soffice --headless --convert-to pdf --outdir /tmp output.docx
# Open/inspect PDF to confirm: formatting, tables, images, headers/footers

QA Checklist

Assume there are problems. Your job is to find them.

Issue Detection

officecli view doc.docx issues
officecli view doc.docx issues --type format
officecli view doc.docx issues --type content
officecli view doc.docx issues --type structure

Content QA

officecli view doc.docx text
officecli view doc.docx outline
officecli query doc.docx 'p:empty'
officecli query doc.docx 'image:no-alt'

# Check for leftover placeholders
officecli query doc.docx 'p:contains("lorem")'
officecli query doc.docx 'p:contains("placeholder")'

Pre-Delivery Checklist

Verification Loop

  1. Generate document
  2. Run view issues + view outline + view text + validate
  3. List issues found (if none, look again more critically)
  4. Fix issues
  5. Re-verify — one fix often creates another problem
  6. Repeat until a full pass reveals no new issues

Do not declare success until you have completed at least one fix-and-verify cycle.

Common Pitfalls

PitfallCorrect Approach
--name "foo"Use --prop name="foo" — all attributes go through --prop
Guessing property namesRun officecli help docx set paragraph --json for exact names
\n in shell stringsUse \\n for newlines in --prop text="line1\\nline2"
Hex colors with #Use FF0000 not #FF0000 — no hash prefix
Paths are 1-based/body/p[1], /body/tbl[1] — XPath convention
--index is 0-based--index 0 = first position — array convention
Unquoted [N] in zsh/bashShell glob-expands /body/p[1] — always quote paths: "/body/p[1]"
Spacing via empty paragraphsUse spaceBefore/spaceAfter properties on paragraphs
$ in --prop text=Use single quotes: --prop text='$50M'
--prop field=page in footerSilently ignored. Must use raw-set to inject <w:fldChar>
Recreating styles from templatecp source.docx target.docx first — do not add styles with existing IDs
officecli open as background shellRun foreground — open returns immediately, daemon runs in bg automatically
Batch JSON parse errorUse heredoc: cat <<'EOF' | officecli batch FILE.docx
OMML equation creationofficecli cannot generate OMML — escalate to L4
listStyle on runlistStyle is a paragraph property, not a run property
Row-level bold/color/shdRow set only supports height, header, and c1/c2/c3 text shortcuts — use cell-level set
Section vs root property namesSection uses pagewidth/pageheight (lowercase). Document root uses pageWidth/pageHeight (camelCase)
Wrong border formatUse style;size;color;space format: single;4;FF0000;1
Code block indentation via spacesUse ind.left paragraph property (e.g. --prop ind.left=720)
chartType=pie/doughnut in LibreOfficeSlices are invisible in PDF — use column or bar instead

Known Issues

IssueWorkaround
No visual previewUnlike pptx (SVG/HTML), docx has no built-in rendering. Use view text/outline/annotated/issues for verification. Open in Word for visual check.
Track changes creation requires raw XMLOfficeCLI can accept/reject but cannot create tracked changes. Use raw-set or Python scripts (L3).
Tab stops may require raw XMLTab stop creation is not exposed in high-level commands. Use raw-set.
Chart series cannot be added after creationset --prop data= can only update existing series. Delete and recreate the chart.
Complex numbering definitionslistStyle=bullet/numbered covers simple cases. For multi-level lists, use numId/numLevel.
Batch intermittent failure~1-in-15 batch operations may fail. Retry or close/reopen file. Split large batches into 10-15 chunks.
Table-level padding produces invalid XMLDo not use set tbl[N] --prop padding=N. Use cell-level padding.top/padding.bottom.
Internal hyperlinks not supportedhyperlink only accepts absolute URIs. Use raw-set with <w:hyperlink w:anchor="...">.
Table --index positioning unreliable--index N on add /body --type table may be ignored. Add content in desired order.
view text shows "1." for all numbered itemsDisplay-only limitation. Rendered output in Word/LibreOffice shows correct numbers.

Anti-Patterns (Must Avoid)

Tool Discovery

Always confirm syntax from help before guessing:

officecli --help
officecli help docx
officecli help docx add
officecli help docx set
officecli help docx query
officecli help docx add paragraph --json
officecli help docx set run --json

Dependencies

ToolPurposeStatus
officecli (PATH)Primary DOCX CLI — global install includes CJK forkRequired
dotnetRuntime/build for officecliRequired for fork builds
python3Fallback scripts (scripts/*.py)Required for L3/L4
lxmlPython XML processing for scripts/ooxml/*Required for L3/L4
sofficePDF conversion / .doc migration / macro workflowsOptional fallback
pdftoppmImage-based QA after PDF renderOptional fallback

Prerequisite Check

# Required
python3 -c "import docx, lxml" || echo "MISSING: pip install python-docx lxml"

# On-demand: LibreOffice
which soffice >/dev/null 2>&1 || echo "INFO: LibreOffice not installed"

# On-demand: OfficeCLI
which officecli >/dev/null 2>&1 || echo "INFO: OfficeCLI not installed"