HWP (한글)

HWP/HWPX Korean document create, read, edit via OfficeCLI

Default Active: Yes Binary HWP: Experimental

한글 .hwp .hwpx HWP HWPX Korean documents 한컴오피스 OWPML
Primary Tool
officecli (global install, on PATH)
Fallback
Python OOXML/OWPML scripts (scripts/*.py)
Scope
HWP/HWPX only — no DOCX, PPTX, XLSX, PDF
Binary HWP
Experimental via rhwp bridge (capability-gated)
Scope rule: This skill handles .hwp and .hwpx files exclusively. If the task involves DOCX, PPTX, XLSX, or PDF, it will not activate — use the corresponding format-specific skill instead.

Quick Reference

Complete task-to-command lookup. Each row shows whether the operation is supported and which command to use.

TaskStatusCommand
Format like existing .hwpxYescp source.hwpx target.hwpx && officecli open target.hwpx
Template-based createYespython3 scripts/build_hwpx.py --template {base|gonmun|minutes|proposal|report}
Create new .hwpxYesofficecli create file.hwpx
Create from MarkdownYesofficecli create file.hwpx --from-markdown input.md
Read / analyze .hwpxYesview text|annotated|outline|stats|html|markdown|tables|forms|objects
Edit existing .hwpxYesset, add, remove, move, swap
Label-based fillYesset /table/fill --prop 'fill:label=val'
Form recognizeYesview forms --auto
Table mapYesview tables
Markdown exportYesview markdown
Equation (수식)Yesadd --type equation --prop 'script={1 over 2}'
Object finderYesview objects
Query (expanded)Yesquery 'tc[text~=홍길동]', :has(), > combinator
Template mergeYesmerge template.hwpx out.hwpx --data '{"key":"val"}'
Swap elementsYesswap file.hwpx '/p[1]' '/p[2]'
Column breakYesadd --type columnbreak --prop cols=2
Image / floatingYesadd --type picture --prop anchor=page --prop halign=center
Compare documentsYescompare a.hwpx b.hwpx (LCS diff + table compare)
Security validationYesZIP bomb, path traversal, symlink, XXE defense
Broken ZIP recoveryYesCorrupted HWPX auto-recovery via Local File Header scan
HTML previewYesview html --browser
Watch live previewYeswatch file.hwpx
Validate .hwpxYesvalidate (9-level check)
Raw XMLYesraw, raw-set
Watermark (image)Yesadd --type watermark --prop src=img.png
Pattern-match editingPython L4scripts/hwpx_cli.py open → pattern edit XML → save
Visual QAPython L3scripts/contact_sheet.py + subagent review
New form field creationBlockedSource prototype exists; Hancom verification not closed
Create new .hwp (binary)Experimentalofficecli create file.hwp --json
Read/export .hwp (binary)Experimentalofficecli view file.hwp text --json + svg/png/pdf/markdown
Edit .hwp text/fieldsExperimentalofficecli hwp --json recipes
Export HWPX to .hwpExperimentalofficecli set input.hwpx /save-as-hwp --prop output=out.hwp --json
Convert .hwp to .hwpxFallbackscripts/hwp_convert.py IN.hwp OUT.hwpx

"~해줌" Usage Examples

Natural Korean prompts that trigger this skill. Just describe what you need.

공문 작성
"공문 양식으로 문서 만들어줌 — 문서번호, 수신, 제목 채워서"
Triggers template assembly with --template gonmun, then uses /table/fill to populate label-value fields like 문서번호, 수신, 참조, 제목.
문서 분석
"이 HWPX 파일 분석해줌 — 표 구조랑 양식 알려줌"
Runs view tables, view forms --auto, and view stats to provide a structural breakdown of the document.
표 채우기
"이 신청서에서 성명=홍길동, 연락처=010-1234-5678 채워줌"
Uses label-based fill: officecli set form.hwpx / --prop 'fill:성명=홍길동' --prop 'fill:연락처=010-1234-5678'
회의록 작성
"회의록 만들어줌 — 일시, 장소, 참석자, 안건 포함해서"
Creates from --template minutes, populates structured fields for date, location, attendees, and agenda items.
HWP 변환
"이 .hwp 파일 읽어서 내용 보여줌"
Checks officecli hwp doctor --json for native rhwp support. If ready, uses officecli view file.hwp text --json. Otherwise falls back to scripts/hwp_reader.py.

Editing Escalation Ladder

When the primary tool cannot handle the job, the skill escalates through four levels. Lower levels are preferred; higher levels are used only when necessary.

LevelWhenTool
L1 OfficeCLI high-levelTypical add/set/remove, label-fill, view modesofficecli add/set/remove/query/view/merge
L2 OfficeCLI raw/raw-setDirect section0.xml / header.xml tweaksofficecli raw FILE /Contents/section0.xml
L3 Python scriptBulk find/replace, template assembly, pattern-matchpython3 scripts/hwpx_cli.py ... or scripts/build_hwpx.py
L4 Unpack → edit XML → repackKICE exams, regulations, multi-file XML edithwpx_cli.py open → edit work/Contents/*.xml → strip lineseg → save

Escalation Signals

Core Workflows

Create & Import

# Empty document
officecli create doc.hwpx

# From Markdown (JUSTIFY alignment by default)
officecli create doc.hwpx --from-markdown input.md

# Left-aligned Markdown import
officecli create doc.hwpx --from-markdown input.md --align left

# Binary HWP via rhwp bridge
officecli create file.hwp --json

# Template merge with data
officecli merge template.hwpx out.hwpx --data '{"이름":"홍길동"}'
officecli merge template.hwpx out.hwpx --data data.json

# Template assembly (recommended for structured docs)
python3 scripts/build_hwpx.py --template gonmun --output gonmun.hwpx

View Modes

officecli view doc.hwpx text                    # line-numbered text
officecli view doc.hwpx annotated               # path + style detail
officecli view doc.hwpx outline                 # headings only
officecli view doc.hwpx stats                   # document statistics
officecli view doc.hwpx html --browser          # A4 HTML preview
officecli view doc.hwpx markdown                # GFM markdown export
officecli view doc.hwpx tables                  # table 2D grid + label map
officecli view doc.hwpx forms --auto            # CLICK_HERE + label-value auto-detect
officecli view doc.hwpx forms --auto --json     # JSON for AI pipeline
officecli view doc.hwpx objects                 # picture/field/bookmark/equation list
officecli view doc.hwpx objects --object-type field  # filter by type
officecli view doc.hwpx styles                  # charPr/paraPr styles
officecli view doc.hwpx issues                  # 9-level validation issues
Common mistake: officecli view file.hwpx without a mode argument is an error. You must always specify the mode: text, tables, markdown, etc.

Edit Operations

# Add paragraph with text and font size
officecli add doc.hwpx /section[1] --type paragraph --prop text="content" --prop fontsize=11

# Add table
officecli add doc.hwpx /section[1] --type table --prop rows=3 --prop cols=4

# Set properties (bold, alignment)
officecli set doc.hwpx '/section[1]/p[1]' --prop bold=true --prop align=CENTER

# Find and replace text
officecli set doc.hwpx / --prop find="old" --prop replace="new"

# Remove element
officecli remove doc.hwpx /section[1]/p[3]

# Swap two elements
officecli swap doc.hwpx '/p[1]' '/p[2]'

Label Fill (Table Auto-Fill)

# Fill by label (fill: prefix)
officecli set doc.hwpx / --prop 'fill:대표자=홍길동' --prop 'fill:연락처=010-1234'

# Directional fill: right (default), down, left, up
officecli set doc.hwpx / --prop 'fill:주소>down=서울시'

# Shorthand (fill: prefix optional with /table/fill path)
officecli set doc.hwpx /table/fill --prop '이름=김서준'

Query (Extended Syntax)

officecli query doc.hwpx 'p'                          # all paragraphs
officecli query doc.hwpx 'tc[text~=홍길동]'           # cell text search
officecli query doc.hwpx 'run[bold=true]'              # bold runs
officecli query doc.hwpx 'p:has(tbl)'                  # paragraphs containing tables
officecli query doc.hwpx 'tbl > tr > tc[colSpan!=1]'   # merged cells
officecli query doc.hwpx 'run[fontsize>=20]'           # 20pt+ font
officecli query doc.hwpx 'p[heading=1]'                # heading 1

Operators: =, !=, ~= (contains), >=, <=
Pseudo-selectors: :empty, :contains(text), :has(child), :first, :last
Virtual attributes: text, bold, italic, fontsize, colSpan, rowSpan, heading

Resident Mode (Live Connection)

officecli open doc.hwpx          # returns IMMEDIATELY; daemon in bg
officecli view text               # view without re-opening
officecli set '/p[1]' --prop bold=true
officecli close                   # close session
Do NOT run officecli open as a background shell job. It returns immediately and the daemon lives in the background automatically. Running it as a monitored background shell creates zombies and file locks.

Batch Mode

officecli batch doc.hwpx <<'EOF'
view text
view stats
view forms --auto
EOF

Compare

officecli compare a.hwpx b.hwpx                    # text diff (default)
officecli compare a.hwpx b.hwpx --mode outline      # heading diff
officecli compare a.hwpx b.hwpx --mode table --json  # table diff JSON

Uses LCS DP alignment (fallback greedy for >10M cells). Table similarity: dimension weight 0.3 + content weight 0.7. Page range filtering: --pages "1-3,5".

Image & Watermark

# Inline image
officecli add doc.hwpx /section[1] --type picture --prop path=/path/to/image.png

# Page-centered floating image
officecli add doc.hwpx /section[1] --type picture \
  --prop path=/path/to/image.png \
  --prop anchor=page --prop halign=center --prop valign=middle

# Watermark (opaque RGB PNG recommended)
officecli add doc.hwpx /section[1] --type watermark \
  --prop src=/path/to/watermark.png --prop bright=0 --prop contrast=0

Watch & HTML Preview

officecli watch doc.hwpx           # auto-refresh HTML on file change
officecli unwatch doc.hwpx         # stop
officecli view doc.hwpx html --browser  # one-shot A4 preview

Template Assembly System

HWP uses a unique base + overlay template system. Most HWPX creation for 공문/보고서/회의록/제안서 should use this instead of officecli create blank.

Available Templates

TemplatePurposeKey Fields
baseEmpty HWPX skeletonmimetype, META-INF, empty header/section
gonmunOfficial letter (공문)문서번호, 수신, 참조, 제목, 본문
minutesMeeting minutes (회의록)일시, 장소, 참석자, 안건, 결정사항
proposalProposal (제안서)제안개요, 배경, 내용, 기대효과
reportReport (보고서)요약, 현황, 분석, 제언

Method 1: build_hwpx.py (Recommended)

python3 scripts/build_hwpx.py --template report --output Q4Report.hwpx

# Then edit content with officecli
officecli open Q4Report.hwpx
officecli set Q4Report.hwpx /table/fill --prop '제목=2026 Q4 보고'
# ... continue editing ...
officecli close Q4Report.hwpx

Method 2: Manual Overlay (for Customization)

# 1. Copy base skeleton
cp -r templates/base/ work/

# 2. Overlay domain-specific styles
cp -r templates/gonmun/* work/Contents/

# 3. Edit header.xml and section0.xml as needed
#    Reference: reference/header-xml-guide.md, reference/section0-xml-guide.md
#    Style IDs: reference/style_id_maps.md

# 4. Repack as HWPX (ZIP with strip + minify)
python3 scripts/ooxml/pack.py work/ out.hwpx

# 5. Validate
officecli validate out.hwpx

Reference-Based Editing

When the user says "format like X.hwpx", "공문 양식처럼", "기존 보고서 스타일", or provides a source file — start from the source, do not rebuild from scratch.

Workflow

  1. Copy the source: cp source.hwpx target.hwpx — inherits header.xml (styles), section0.xml (structure), META-INF
  2. Open: officecli open target.hwpx — daemon returns immediately (do NOT run as background)
  3. Remove body paragraphs only — keep header.xml (charPr/paraPr/borderFill), META-INF, settings
  4. Add new content using existing styleidref values — they auto-apply
Why copy first? HWPX header.xml holds all style definitions (charPr, paraPr, borderFill, listItems). Rebuilding from scratch breaks styleidref cross-references, loses consistent visual conventions, fails validation, and takes 10x longer.

Template Priority

  1. User-provided source file — first-class template
  2. tests/fixtures/agentic/*.hwpx — realistic samples (gonmun, report, minutes with tables)
  3. templates/{base,gonmun,minutes,proposal,report}/ — Template Assembly system
  4. officecli create blank — only when nothing else applies

Form Recognition & Fill

4-Strategy Recognition

#StrategyDescription
1Adjacent cell label-valueTable label → value detection (default)
2Header + data rowsColumn-header recognition
3In-cell patterns checkbox, keyword( ) paren-blank, (label: ) annotation
4KV table detection16 Korean keywords trigger auto-detection

3-Phase Fill Pipeline

  1. In-cell patterns: checkbox , paren-blank fill, annotation fill
  2. Table label-value: exact + prefix 60% matching, 4-directional (right/down/left/up)
  3. Inline paragraph: regex lookbehind for "label: value" outside tables

AI Form Fill Workflow

# Step 1: Recognize all fields
officecli view form.hwpx forms --auto --json > fields.json

# Step 2: AI maps label -> value (your logic here)

# Step 3: Fill matched fields
officecli set form.hwpx /table/fill --prop '성 명=홍길동'

Binary HWP Support (Experimental)

Native binary .hwp operations are powered by the rhwp bridge. All operations are capability-gated — always check readiness before assuming support.

Discovery Commands

# Always run these first
officecli hwp doctor --json       # runtime readiness check
officecli capabilities --json     # full capability matrix
officecli hwp --json              # current recipes and policies

Supported Operations (When Ready)

# Create
officecli create file.hwp --json

# View / Export
officecli view file.hwp text --json
officecli view file.hwp svg --page 1 --json
officecli view file.hwp png --page 1 --out /tmp/hwp-png --json
officecli view file.hwp pdf --page 1 --out out.pdf --json
officecli view file.hwp markdown --json
officecli view file.hwp thumbnail --out thumb.png --json
officecli view file.hwp info --json
officecli view file.hwp tables --section 0 --json

# Edit fields
officecli set file.hwp /field --prop name=회사명 --prop value=리지 --prop output=out.hwp --json

# Edit text
officecli set file.hwp /text --prop find=마케팅 --prop value=브릿지 --prop output=out.hwp --json

# Insert text
officecli add file.hwp /text --type paragraph --prop value=새본문 --prop output=out.hwp --json

# Edit table cell
officecli set file.hwp /table/cell --prop section=0 --prop parent-para=3 \
  --prop control=0 --prop cell=0 --prop value=오피스셀 --prop output=out.hwp --json

# Native operations (escape hatch)
officecli view file.hwp native --op get-style-list --json
officecli set file.hwp /native-op --prop op=split-paragraph --prop output=out.hwp --json

# HWPX to HWP export
officecli set input.hwpx /save-as-hwp --prop output=out.hwp --json
Policy: Prefer output mode for binary .hwp mutation: --prop output=out.hwp. Use in-place mode only for /text replacement, only when explicitly requested, and only after confirming safeInPlace.ready=true. In-place mode must include --in-place --backup --verify.

Equation Handling (수식)

HWPX equations use Hancom's proprietary script language. This is NOT MathML, NOT LaTeX, NOT OMML.

ScriptResult
{1 over 2}1/2 (fraction)
sqrt{x}Square root of x
x^2, x_iSuperscript, subscript
int _0 ^1 f(x)dxDefinite integral
sum _{i=1} ^nSigma summation
lim _{x->0}Limit
matrix{a&b # c&d}2x2 matrix
# Create equation
officecli add doc.hwpx /section --type equation --prop 'script=x^2 + y^2 = r^2'

# View all equations
officecli view doc.hwpx objects --object-type equation
Equation scripts are opaque — edit only <hp:script> text, not the binary payload. Math exam docs (KICE) require <hp:equation> for every expression. Never use plain text for math.

Korean Document Design Principles

Government Form Aesthetics (한국 공공양식 미감)

Uniform Spacing Detection (균등분할)

Korean forms often use uniform character spacing for names in cells: "홍 길 동" (spaces between each character). This is a display convention, not data.

Document Type Classification

TypeKey SignalsExample
examequation 10+, rect objectsKICE 수능/모의고사 시험지
formtable 3+, checkboxes (☐/■)대학 신청서, 정부 양식
regulation○ bullets 10+, 별첨/조항 refs, table 10+운영지침, 내규, 시행세칙
reportlong text, few tables보고서, 논문
mixednone of above사업계획서

Mandatory Verification

Never skip verification. After ANY HWPX edit operation, always execute validation and visual checks in order.
# 1. Structural validation (MUST pass)
officecli validate output.hwpx

# 2. PDF visual verification (MUST check)
soffice --headless --convert-to pdf --outdir /tmp output.hwpx
# Verify: table positions, guide text removed, checkboxes correct,
#         merged cell text in correct row, numbers not corrupted

# 3. Visual QA via subagent (use reference/visual_qa_prompt.md)
python3 scripts/contact_sheet.py /tmp/output.pdf sheet.png

# 4. If Hancom Office available, also open .hwpx directly

Pre-Delivery Checklist

Security

CheckLimits
ZIP bomb1000 entries, 200 MB, 100:1 ratio
Path traversalnull byte, .., absolute path, drive letter, symlink
XXEDtdProcessing.Prohibit
Table size200 cols x 10,000 rows

Reference Materials & Scripts

Reference Documents (reference/)

FileRead WhenContains
reference/hwpx-format.mdBefore any direct XML editOWPML ZIP structure, namespaces, file layout, mimetype
reference/header-xml-guide.mdAdding/modifying charPr/paraPr/borderFill stylesHow to add new styles to header.xml
reference/section0-xml-guide.mdParagraph/table/mixed-formatting XMLXML template for section0.xml bodies
reference/style_id_maps.mdStyle ID lookup for template overlayComplete style ID index for all templates
reference/dependencies.mdFirst-time setup / environment checkPython/system packages needed
reference/visual_qa_prompt.mdVisual QA via subagentReady-to-use prompt for PDF-image inspection
reference/table_templates/*.xmlInserting pre-built tables2x6, 3x3, 4x4, 5x4 grid XML fragments

Python Scripts (scripts/)

ScriptPurposeCommand
hwpx_cli.pyUnified Python CLI (14+ commands)python3 scripts/hwpx_cli.py {command} ...
build_hwpx.pyTemplate-based creationpython3 scripts/build_hwpx.py --template {type}
analyze_template.pyInspect template structurepython3 scripts/analyze_template.py work/
create_document.pyCreate empty or custom HWPXpython3 scripts/create_document.py OUT.hwpx
table_builder.pyBuild table XML from Python objectsUsed internally
page_guard.pyDetect paragraph/table/text driftpython3 scripts/page_guard.py -r ref.hwpx -o out.hwpx
contact_sheet.pyQA contact sheet (page grid image)python3 scripts/contact_sheet.py INPUT.pdf sheet.png
validate.py9-level structural validationpython3 scripts/validate.py INPUT.hwpx
hwp_reader.pyRead HWP 5.0 binary (read-only)python3 scripts/hwp_reader.py INPUT.hwp
hwp_convert.pyHWP → HWPX conversionpython3 scripts/hwp_convert.py IN.hwp OUT.hwpx
text_extract.pyExtract plain text from HWPXpython3 scripts/text_extract.py INPUT.hwpx

Pattern-Match Editing (L4 Fallback)

For complex form editing beyond officecli set/find-replace — KICE exams, multi-section regulations, fragmented text nodes.

Core Flow

# Unpack HWPX
python3 scripts/hwpx_cli.py open input.hwpx

# Edit XML files in work/Contents/
# (lineseg is stripped automatically by hwpx_cli.py)

# Repack
python3 scripts/hwpx_cli.py save output.hwpx

Key Patterns

PatternDescription
Lineseg stripRemove stale <hp:linesegarray> cache on every direct XML write
Checkbox substitution, with multi-<t> node handling
Uniform-space normalization"홍 길 동""홍길동" conversion
p[0] MonstersecPr + tbl + question 1 text merged in first paragraph
Equation interleaving<t><equation> alternating — skip equations during text extraction

Lineseg Strip (Critical)

When editing HWPX XML directly (NOT via officecli or scripts/hwpx_cli.py), you MUST strip ALL <hp:linesegarray> elements. Stale layout cache causes characters to overlap into a single line.

import re
xml = re.sub(r'<(?:hp:)?linesegarray[^>]*>.*?</(?:hp:)?linesegarray>', '', xml, flags=re.DOTALL)
xml = re.sub(r'<(?:hp:)?linesegarray[^/]*/>', '', xml)  # self-closing
officecli and scripts/hwpx_cli.py handle lineseg stripping automatically. This rule applies only to raw Python XML editing.

Common Pitfalls

PitfallCorrect Approach
--props text=Hello--prop text=Hello — singular --prop always
/body/p[1] pathHWPX uses /section[1]/p[1] — section-based, not body
Unquoted [N] in shell"/section[1]/p[1]" — always quote paths
fontsize omitted--prop fontsize=11 always — prevents charPr 0 pollution
officecli view file.hwpx (no mode)Error. Must specify: text, markdown, tables, etc.
Manual table mappingview tables replaces manual inspection
Recreating header.xml styles from templatecp source.hwpx target.hwpx first. Read reference/style_id_maps.md
officecli open as background shellRun foreground — returns immediately, daemon runs in bg automatically
Direct XML edit without lineseg stripStale cache causes text overlap. Use hwpx_cli.py or strip manually
Custom style work without reading reference/reference/header-xml-guide.md + reference/style_id_maps.md are mandatory

Anti-Patterns (Must Avoid)

  1. No equations in math exams = broken output — KICE docs require <hp:equation> elements
  2. No unguarded HWP binary overwrite — prefer --prop output=...; only use safe in-place when safeInPlace.ready=true
  3. No fake HWPX fallback when rhwp has a native primitive — if OfficeCLI lacks the route, report the gap and stop for approval
  4. No XML editing without lineseg strip — stale cache causes overlapping text
  5. No visible QA markers in fixed-layout exams — use screenshots or sidecar evidence instead
  6. No cross-format skill loading — this skill is .hwp/.hwpx only
  7. No rebuilding styles that exist in templatecp first and read reference/style_id_maps.md
  8. No ignoring reference materialsheader-xml-guide.md, section0-xml-guide.md, and style_id_maps.md are mandatory for custom XML work

Dependencies

ToolPurposeRequired?
officecli (global)Primary HWPX CLI + experimental rhwp-backed HWP bridgeRequired
python3Fallback scripts (scripts/*.py)Required for L3/L4
lxmlXML processing for scripts/*Required for L3/L4
pyhwpLegacy HWP 5.0 binary reading/conversion fallbackRequired for HWP→HWPX fallback
soffice (LibreOffice)PDF conversion + visual verificationRecommended
Java (JAVA_HOME)H2Orestart HWP conversion engineFor HWP→HWPX only
dotnetBuild officecli from sourceFor builds only

Prerequisite Check

# OfficeCLI (required)
which officecli >/dev/null 2>&1 || echo "WARN: OfficeCLI not installed"

# LibreOffice (recommended, auto-installs when needed)
which soffice >/dev/null 2>&1 || echo "INFO: LibreOffice not installed"

# Python packages (optional for L3/L4)
python3 -c "import lxml; import pyhwp" 2>/dev/null || echo "OPTIONAL: pip install lxml pyhwp"

# Java (for HWP conversion only)
echo "JAVA_HOME=$JAVA_HOME"

Tool Discovery

Always confirm syntax from help before guessing.

officecli --help
officecli hwp --json
officecli hwp doctor --json
officecli capabilities --json
officecli view --help
officecli set --help
python3 scripts/hwpx_cli.py --help
python3 scripts/build_hwpx.py --help

Exam XML Structure Patterns

KICE-style exam sheets require special handling due to fixed-layout constraints.

PatternDescriptionDetection
Page/Column breakspageBreak="1" / columnBreak="1"Page boundary = question group boundary
p[0] MonstersecPr + colPr + title tbl + question 1 text mergedEverything in first paragraph
Equation interleaving<t><equation> alternatingSkip equations during text extraction
Answer choices + 5 <equation> (5-choice)Auto-detect answer paragraphs
Text fragmentation1-2 char <t> splits (HWP conversion)Concatenate all text then match
2-column layout<hp:colPr type="NEWSPAPER" colCount="2">Exam-specific layout
Fixed-layout rule: Do not insert QA/proof markers into the visible body of exam documents. Strings such as [CU TEMPLATE EDIT ...] or VISUAL QA inside the question body are visual hard failures. Capture before/after screenshots instead.

HWP → HWPX Conversion

Format Detection

file doc.hwpx   # "Zip archive" -> HWPX (ZIP + OWPML XML)
file doc.hwp    # "HWP Document" -> HWP 5.0 binary

Structural Differences

AspectNative HWPXHWP → HWPX Converted
Text unitShort <t> per runEntire paragraph in one <t>
Title p[0]secPr + tbl + contentPage number fragments <t>20</t> + <t>1</t> mixed in
EditingRun-level precise replacementRaw string replace or whole-paragraph swap needed

Editing Strategies for Converted Files

  1. Title: run-aware replacement — set_run_text(p0, 'old', 'new') (skip page-number runs)
  2. Body: raw string replace on serialized XML — sec0.replace(old, new)
  3. Multi-<t> cells: use ReplaceTextInCell() — concatenate all <t> → match → redistribute