dxpdf 0.1.3

A fast DOCX-to-PDF converter powered by Skia
docs.rs failed to build dxpdf-0.1.3
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

dxpdf

A fast, lightweight DOCX-to-PDF converter written in Rust, powered by Skia.

Convert Microsoft Word .docx files to high-fidelity PDF documents — from the command line or as a Rust library. No Microsoft Office, LibreOffice, or cloud API required.

Built by nerdy.pro.

Why dxpdf?

  • Fast — converts a 24-page document in ~115ms on Apple Silicon
  • Accurate — Flutter-inspired measure→layout→paint pipeline with pixel-level fidelity
  • Standalone — no external dependencies beyond Skia; no Office installation needed
  • Cross-platform — runs on macOS, Linux, and Windows
  • Dual-use — works as a CLI tool, Rust library (use dxpdf;), or Python package (import dxpdf)

Quick Start

Install

cargo install dxpdf

Convert a file

dxpdf input.docx                  # outputs input.pdf
dxpdf input.docx -o output.pdf    # specify output path

Use as a library

let docx_bytes = std::fs::read("document.docx")?;
let pdf_bytes = dxpdf::convert(&docx_bytes)?;
std::fs::write("output.pdf", &pdf_bytes)?;

You can also inspect or transform the parsed document model:

use dxpdf::{parse, model};

let document = parse::parse(&std::fs::read("document.docx")?)?;

for block in &document.blocks {
    match block {
        model::Block::Paragraph(p) => { /* ... */ }
        model::Block::Table(t) => { /* ... */ }
    }
}

let pdf_bytes = dxpdf::convert_document(&document)?;

Use from Python

pip install dxpdf
import dxpdf

# Bytes in, bytes out
pdf_bytes = dxpdf.convert(open("input.docx", "rb").read())

# File to file
dxpdf.convert_file("input.docx", "output.pdf")

What's Supported

dxpdf handles the most common DOCX features used in real-world business documents:

Category Features
Text Bold, italic, underline, font size/family/color, character spacing, superscript/subscript, run shading
Paragraphs Alignment (left/center/right), spacing (before/after/line with auto/exact/atLeast), indentation, tab stops, paragraph borders, paragraph shading
Tables Column widths, cell margins (3-level cascade), merged cells (gridSpan + vMerge with height distribution), row heights, borders, cell shading, nested tables
Images Inline (PNG/JPEG/BMP/WebP), floating/anchored with alignment and percentage-based positioning
Styles Paragraph styles, character styles (including built-in Hyperlink), basedOn inheritance, document defaults, theme fonts
Headers/Footers Text, images, page numbers (PAGE/NUMPAGES field codes)
Lists Bullets, decimal, lower/upper letter, lower/upper roman with counter tracking
Hyperlinks Clickable PDF link annotations with URL resolution from relationships
Sections Multiple page sizes/margins, section breaks, portrait/landscape
Layout Automatic pagination, word wrapping at spaces and hyphens, line spacing modes, floating image text flow

Building from Source

Prerequisites

  • Rust toolchain (1.70+)
  • clang (required by skia-safe for building Skia bindings)
  • Linux only: libfontconfig1-dev and libfreetype-dev (e.g., sudo apt-get install -y libfontconfig1-dev libfreetype-dev)

Build

cargo build --release

The release binary will be at target/release/dxpdf.

Architecture

dxpdf follows a measure→layout→paint pipeline inspired by Flutter's rendering model:

DOCX (ZIP) → Parse → Document Model (ADT) → Measure → Layout → Paint → Skia PDF

Each layout element (paragraphs, table cells, headers/footers) goes through three phases:

  1. Measure — collect fragments, fit lines, produce draw commands with relative coordinates
  2. Layout — assign absolute positions, handle page breaks, distribute heights (e.g., vMerge spans)
  3. Paint — emit draw commands at final positions (shading → content → borders)

Modules

Module Description
model Algebraic data types representing the document tree (Document, Block, Inline, etc.)
parse DOCX ZIP extraction, event-driven XML parser, style/numbering resolution
render/layout Measure→layout→paint pipeline: fragment (shared line fitting), paragraph, table (three-pass), header_footer
render/painter Skia canvas operations for PDF output
render/fonts Font resolution: tries requested font first, falls back to metric-compatible substitutes
units OOXML unit conversions (twips, EMUs, half-points) — spec-defined constants only

Running Tests

cargo test

The test suite includes 104 unit tests and 9 integration tests covering layout, tables, lists, floats, headers/footers, hyperlinks, superscript/subscript, field codes, and end-to-end conversion.

Visual regression tests compare rendered PDFs against Word-generated references using pixel matching (see VISUAL_COMPARISON.md).

OOXML Feature Coverage

Validated against ISO 29500 (Office Open XML). 34 features fully implemented, 6 partial, 15 not implemented.

Text Formatting (w:rPr)

Feature Status
Bold, italic ✅ with toggle support
Underline ✅ font-proportional stroke width
Font size, family, color
Superscript/subscript
Character spacing
Run shading
Strikethrough
Highlighting
Caps, smallCaps
Shadow, outline, emboss, imprint
Hidden text

Paragraph Properties (w:pPr)

Feature Status
Alignment (left, center, right)
Alignment (justify) ⚠️ parsed, renders left-aligned
Spacing before/after, line spacing ✅ auto/exact/atLeast
Indentation (left, right, first-line, hanging)
Tab stops (left)
Tab stops (center, right, decimal) ⚠️ parsed, render as left
Paragraph shading
Paragraph borders ✅ with adjacent border merging
Keep with next, widow/orphan control

Styles

Feature Status
Paragraph styles, character styles
basedOn inheritance
Document defaults, theme fonts

Tables

Feature Status
Grid columns, cell widths (dxa)
Cell widths (pct, auto) ⚠️ fall back to grid
Cell margins (3-level cascade)
Merged cells (gridSpan, vMerge)
Row heights ✅ min / ⚠️ exact treated as min
Table borders (per-cell, per-table)
Border styles (single)
Border styles (double, dashed, dotted) ⚠️ render as single
Cell shading (solid)
Cell shading (patterns), vertical alignment
Nested tables

Images

Feature Status
Inline images ✅ PNG, JPEG, BMP, WebP
Floating images ✅ offset, align, wp14:pctPos
Wrap modes ✅ none/square/tight/through
VML images

Page Layout

Feature Status
Page size and orientation
Page margins (all 6)
Section breaks (nextPage)
Section breaks (continuous, even, odd) ⚠️ treated as nextPage
Multi-column, page borders, doc grid

Headers/Footers

Feature Status
Default header/footer
First page, even/odd, per-section

Lists

Feature Status
Bullet, decimal, letter, roman
Multi-level lists ⚠️ levels parsed, nesting limited

Fields

Feature Status
PAGE, NUMPAGES
Hyperlinks ✅ clickable PDF annotations
Unknown fields ✅ cached value fallback
TOC, MERGEFIELD, DATE

Other

Feature Status
Footnotes/endnotes ❌ warned
Comments, tracked changes ❌ / ⚠️
Text boxes, shapes, SmartArt, charts
RTL text, automatic hyphenation

Performance

Benchmarked on Apple M3 Max with hyperfine (20 runs, 3 warmup):

Metric Value
Mean conversion time 113 ms
Peak memory (RSS) 19 MB
Test document 3-page form with 11 tables, 2 images, 2 sections

See BENCHMARKS.md for full history.

Dependencies

Crate Purpose
quick-xml Event-driven XML parsing
zip DOCX ZIP archive reading
skia-safe PDF rendering, text measurement, link annotations
clap CLI argument parsing
thiserror Error types
log + env_logger Warnings for unsupported features (RUST_LOG=warn)
pyo3 (optional) Python bindings via maturin

Contributing

Contributions are welcome. Please open an issue before submitting large PRs.

Built by nerdy.pro.

License

MIT