docx-lite 0.1.0

Lightweight, fast DOCX text extraction library with minimal dependencies
Documentation

docx-lite

Crates.io Documentation License

A lightweight, fast DOCX text extraction library for Rust with minimal dependencies.

Features

  • 🚀 Fast - Optimized for speed with streaming XML parsing
  • ðŸŠķ Lightweight - Minimal dependencies (only zip, quick-xml, and thiserror)
  • ðŸ›Ąïļ Safe - Zero unsafe code
  • 📊 Tables - Full support for table text extraction
  • ðŸŽŊ Simple API - Easy to use with both simple and advanced APIs
  • 🔧 Robust - Handles malformed documents gracefully

Installation

Add this to your Cargo.toml:

[dependencies]
docx-lite = "0.1.0"

Quick Start

use docx_lite::extract_text;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let text = extract_text("document.docx")?;
    println!("{}", text);
    Ok(())
}

Advanced Usage

use docx_lite::parse_document_from_path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let doc = parse_document_from_path("document.docx")?;

    // Access paragraphs
    for paragraph in &doc.paragraphs {
        println!("Paragraph: {}", paragraph.to_text());

        // Access runs with formatting info
        for run in &paragraph.runs {
            if run.bold {
                println!("  Bold text: {}", run.text);
            }
        }
    }

    // Access tables
    for table in &doc.tables {
        for row in &table.rows {
            for cell in &row.cells {
                println!("Cell: {}", cell.paragraphs[0].to_text());
            }
        }
    }

    Ok(())
}

API

Simple API

  • extract_text(path) - Extract all text from a DOCX file
  • extract_text_from_bytes(bytes) - Extract text from DOCX bytes
  • extract_text_from_reader(reader) - Extract text from any reader

Advanced API

  • parse_document(reader) - Parse DOCX into a structured Document
  • parse_document_from_path(path) - Parse DOCX file into a structured Document

Supported Elements

  • ✅ Paragraphs
  • ✅ Runs (with bold, italic, underline formatting)
  • ✅ Tables (with rows and cells)
  • ✅ Basic text extraction
  • 🚧 Lists (coming soon)
  • 🚧 Headers/Footers (coming soon)
  • 🚧 Footnotes/Endnotes (coming soon)

Performance

docx-lite is designed for speed and efficiency:

  • Streaming XML parsing (no full DOM loading)
  • Minimal memory allocation
  • Zero-copy where possible
  • Optimized for text extraction use case

Why docx-lite?

Unlike other DOCX libraries in the Rust ecosystem, docx-lite:

  1. Compiles on modern Rust - No issues with latest Rust versions
  2. Minimal dependencies - Reduces compilation time and security surface
  3. Production-ready - Used in production at V-Lawyer
  4. Focused scope - Does one thing well: text extraction

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is dual-licensed under MIT OR Apache-2.0.

Credits

Developed by the V-Lawyer team as part of our commitment to open source.