docx-lite
A lightweight, fast DOCX text extraction library for Rust with minimal dependencies.
Features
- ð Fast - Optimized for speed with streaming XML parsing
- ðŠķ Lightweight - Minimal dependencies (only
zip,quick-xml, andthiserror) - ðĄïļ Safe - Zero unsafe code
- ð Tables - Full support for table text extraction
- ðŊ Simple API - Easy to use with both simple and advanced APIs
- ð§ Robust - Handles malformed documents gracefully
Installation
Add this to your Cargo.toml:
[]
= "0.2.0"
Quick Start
use extract_text;
Advanced Usage
use ;
API
Simple API
extract_text(path)- Extract all text from a DOCX fileextract_text_from_bytes(bytes)- Extract text from DOCX bytesextract_text_from_reader(reader)- Extract text from any reader
Advanced API
parse_document(reader)- Parse DOCX into a structured Documentparse_document_from_path(path)- Parse DOCX file into a structured Document
Supported Elements
- â Paragraphs
- â Runs (with bold, italic, underline formatting)
- â Tables (with rows and cells)
- â Lists (bullets and numbering) - NEW in v0.2.0
- â Headers/Footers - NEW in v0.2.0
- â Footnotes/Endnotes - NEW in v0.2.0
- â Advanced text extraction with options
Performance
docx-lite is designed for speed and efficiency:
- Streaming XML parsing (no full DOM loading)
- Minimal memory allocation
- Zero-copy where possible
- Optimized for text extraction use case
Why docx-lite?
Unlike other DOCX libraries in the Rust ecosystem, docx-lite:
- Compiles on modern Rust - No issues with latest Rust versions
- Minimal dependencies - Reduces compilation time and security surface
- Production-ready - Used in production at V-Lawyer
- Focused scope - Does one thing well: text extraction
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is dual-licensed under MIT OR Apache-2.0.
Credits
Developed by the V-Lawyer team as part of our commitment to open source.