docx-review
docx-review is a review-oriented DOCX extraction toolkit for Rust.
It reads .docx files directly from OOXML and produces structured JSON for:
- document blocks
- tracked changes
- comments and comment threads
- text anchors
- headers, footers, footnotes, and endnotes
The project is designed as headless infrastructure for automation, review workflows, AI pipelines, and downstream tools that need more than plain text.
What It Extracts
docx-review currently supports:
- paragraphs
- table cells as flat blocks
- tracked changes:
- insert
- delete
- replacement
- move
- format change
- comments
- comment anchors and anchored text
- comment threading and resolved state from
commentsExtended.xml - footnotes and endnotes
- headers and footers
- list item detection by nesting level
- text spans with tracked-change markers
Workspace
crates/coredocx-review-core- extraction library and normalized data model
crates/clidocx-review- command-line interface
Installation
Install the CLI from crates.io:
This installs the docx-review command.
CLI
Basic extraction:
Pretty JSON:
Tracked changes only:
Comments only:
Read from stdin:
|
JSONL output:
Notes:
--format jsonlwith no--onlyemits one block JSON object per line.--only comments --format jsonlemits one comment per line.--only track-changes --format jsonlemits one tracked change per line.
Track changes modes:
Useful flags:
--no-comments--no-text-spans--include-raw-ids-v,-vv
Rust API
Add the crate:
[]
= "0.1"
use extract_from_path;
With options:
use ;
Output Model
At a high level, extraction returns:
Document.blocks- the normalized textual structure of the document
Document.tracked_changes- review-oriented change records
Document.comments- comments, anchors, replies, and resolved state
Document.raw_changes- optional raw tracked changes when
TrackChangesMode::Bothis used
- optional raw tracked changes when
blocks are the main content surface. Comments and tracked changes are linked back to blocks by id.
Scope
The current implementation is focused on review semantics and structural extraction.
Designed for:
- review metadata extraction from real Word documents
- tracked changes and comment workflows
- structural stories outside
document.xml
Not the current focus:
- editing
- image extraction
- full numbering style reconstruction
Development
Run the CLI locally: