rdocx
A pure Rust DOCX library — create, read, and modify Word documents programmatically. Additionally, render pixel-identical PDFs and export to HTML and Markdown, all from the same document object. No LibreOffice, no unoconv, no C dependencies.
Why rdocx?
Most DOCX solutions in the ecosystem shell out to LibreOffice or wrap C/C++ libraries. rdocx is written entirely in Rust, so it compiles to a single binary with zero runtime dependencies. It works everywhere Rust does — including WASM.
The core focus is DOCX: a high-level, python-docx-inspired API for building and editing Word documents with paragraphs, tables, images, headers/footers, styles, and lists. On top of that, rdocx includes a built-in layout engine that paginates your document and can render it to PDF (with font subsetting, bookmarks, and selectable text) or export to HTML and Markdown — so you get faithful output in every format without leaving Rust.
DOCX Features
- Read & write DOCX files with a high-level API
- Tables with merged cells, borders, shading, and content-based column sizing
- Images — inline and anchored, with header/footer background images
- Headers & footers with first-page support and per-section overrides
- Styles — paragraph and character styles, theme color resolution
- Lists with automatic numbering ID management
- Template engine with placeholder replacement (plain text and regex)
- TOC generation with internal hyperlinks and dot-leader tabs
- Document merging with style deduplication and numbering remapping
Output Formats
- PDF — built-in layout engine with text shaping (rustybuzz), Unicode line breaking, multi-section pagination, font subsetting, ToUnicode CMap, bookmarks, and images
- HTML — semantic mapping from OOXML with CSS styling and base64-embedded images
- Markdown — GFM-compatible output with pipe tables and formatting
- PNG — page-to-image rendering via tiny-skia rasterizer
Extras
- WASM support via standalone
rdocx-wasmcrate - CLI tool (
rdocx-cli) — inspect, convert, diff, replace, validate, render
Installation
[]
= "0.1"
To include bundled metric-compatible fonts (Carlito, Caladea, Liberation family):
[]
= { = "0.1", = ["bundled-fonts"] }
Quick Start
Create a document
use ;
let mut doc = new;
doc.add_paragraph;
let mut para = doc.add_paragraph;
para.add_run.bold;
para.add_run;
para.add_run.italic;
doc.add_table;
doc.save.unwrap;
Read a document
use Document;
let doc = open.unwrap;
for para in doc.paragraphs
for table in doc.tables
Convert to PDF
use Document;
let doc = open.unwrap;
doc.save_pdf.unwrap;
// Or get bytes directly
let pdf_bytes = doc.to_pdf.unwrap;
Convert to HTML / Markdown
use Document;
let doc = open.unwrap;
let html = doc.to_html;
let markdown = doc.to_markdown;
Template replacement
use Document;
use HashMap;
let mut doc = open.unwrap;
let mut replacements = new;
replacements.insert;
replacements.insert;
doc.replace_all;
doc.save.unwrap;
Merge documents
use ;
let mut doc = open.unwrap;
let part2 = open.unwrap;
doc.append_with_break;
doc.save.unwrap;
CLI
Install the CLI:
# Inspect document structure
# Extract plain text
# Convert to PDF
# Convert to HTML or Markdown
# Find and replace text
# Diff two documents
How rdocx Compares
vs. Python Libraries
| rdocx | python-docx | docx2pdf | pypandoc | |
|---|---|---|---|---|
| Create DOCX | Yes | Yes | -- | -- |
| Read DOCX | Yes | Yes | -- | -- |
| DOCX to PDF | Yes (built-in) | No | Via MS Word | Via Pandoc + LaTeX |
| DOCX to HTML | Yes (built-in) | No | No | Yes (lossy) |
| DOCX to Markdown | Yes (built-in) | No | No | Yes (lossy) |
| Layout engine | Yes | None | Delegates to Word | Delegates to LaTeX |
| External runtime | None | None (but no PDF) | MS Word required | Pandoc + LaTeX |
| Install size | 4 MB binary | ~5 MB | ~31 KB + Word | 300-650 MB |
| Runs in Docker / CI | Yes | Yes (no PDF) | No | Yes (huge image) |
| WASM / browser | Yes | No | No | No |
python-docx is the most popular DOCX library in any language (~14M PyPI downloads/month), but it has zero conversion capabilities — no PDF, no HTML, no Markdown. Users who need PDF must bolt on a separate tool like LibreOffice (~500 MB) or a commercial API. rdocx gives you the same read/write API plus built-in conversion in a single 4 MB binary.
vs. Java Libraries
| rdocx | Apache POI | docx4j | Aspose.Words | |
|---|---|---|---|---|
| Create DOCX | Yes | Yes | Yes | Yes |
| Read DOCX | Yes | Yes | Yes | Yes |
| PDF (built-in) | Yes | No | Via FOP (limited) | Yes (high fidelity) |
| HTML (built-in) | Yes | No | Yes | Yes |
| License | MIT / Apache-2.0 | Apache-2.0 | Apache-2.0 | $1,199+ |
| Total dependency size | 4 MB | 18-28 MB + JRE | 50-80 MB + JRE | 14 MB + JRE |
| Typical memory (moderate doc) | 10-50 MB | 256 MB - 1 GB | 256 MB - 2 GB | 50-300 MB |
| Cold start | < 10 ms | 2-5 sec | 2-5 sec | 2-5 sec |
| Runtime required | None | JVM (~200 MB) | JVM (~200 MB) | JVM (~200 MB) |
Java solutions carry the JVM's baseline overhead: 50-100 MB of RAM before a single document is loaded, and 2-5 second cold starts from class loading. Apache POI has no built-in PDF at all. docx4j's FOP pipeline is acknowledged by its own maintainer as limited in fidelity. Aspose has excellent PDF output but costs $1,199+ per developer. rdocx delivers comparable capabilities as a zero-dependency native binary.
vs. Other Rust Crates
| rdocx | docx-rs | docx-rust | ooxmlsdk | |
|---|---|---|---|---|
| Create DOCX | Yes | Yes | Yes | Low-level |
| Read DOCX | Yes | Yes | Yes | Low-level |
| Round-trip preservation | Yes | Limited | Limited | N/A |
| Tables, images, headers | Yes | Yes | Basic | Raw XML |
| PDF conversion | Yes | No | No | No |
| HTML / Markdown export | Yes | No | No | No |
| Layout engine | Yes | No | No | No |
| Page-to-image rendering | Yes | No | No | No |
| Template engine | Yes | No | No | No |
| Document merging | Yes | No | No | No |
| Regex find/replace | Yes | No | No | No |
| CLI tool | Yes | No | No | No |
| WASM | Yes | Yes | No | No |
docx-rs (1M+ downloads, 500+ stars) is the most popular Rust DOCX crate, but it is a read/write library only — no conversion, no layout engine, no PDF. The same is true for every other Rust DOCX crate. rdocx is the only Rust crate that combines DOCX read/write with a built-in layout engine and multi-format output (PDF, HTML, Markdown, PNG).
Resource Footprint
| Metric | rdocx (native) | Python + LibreOffice | Java (POI + FOP) |
|---|---|---|---|
| Binary / install size | 4 MB | ~500 MB | ~250 MB (JARs + JRE) |
| Memory (moderate document) | 10-50 MB | ~200-500 MB | ~300 MB - 1.5 GB |
| Cold start | < 10 ms | ~2-4 sec (LibreOffice) | ~2-5 sec (JVM) |
| Serverless / Lambda friendly | Yes | Difficult | Difficult |
| Docker image overhead | ~10 MB (musl static) | ~500 MB+ | ~250 MB+ |
| WASM compatible | Yes | No | No |
Crate Architecture
| Crate | Purpose |
|---|---|
rdocx |
High-level Document API |
rdocx-opc |
OPC/ZIP package I/O |
rdocx-oxml |
OOXML types (CT_Document, CT_PPr, CT_RPr, CT_Tbl, ...) |
rdocx-layout |
Layout engine (text shaping, line breaking, pagination) |
rdocx-pdf |
PDF rendering with font subsetting |
rdocx-html |
HTML and Markdown conversion |
rdocx-cli |
CLI binary |
rdocx-wasm |
WASM bindings (standalone, excluded from workspace) |
Minimum Supported Rust Version
1.93 (edition 2024)
License
Licensed under either of
- MIT license (LICENSE or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
at your option.