Crate litchi

Crate litchi 

Source
Expand description

Litchi - High-performance Rust library for Microsoft Office file formats

Litchi provides a unified, user-friendly API for parsing Microsoft Office documents in both legacy (OLE2) and modern (OOXML) formats. The library automatically detects file formats and provides consistent interfaces for working with documents and presentations.

§Features

  • Unified API: Work with .doc and .docx files using the same interface
  • Format Auto-detection: No need to specify file format - it’s detected automatically
  • High Performance: Zero-copy parsing with SIMD optimizations where possible
  • Production Ready: Clean API inspired by python-docx and python-pptx
  • Type Safe: Leverages Rust’s type system for safety and correctness

§Quick Start - Word Documents

use litchi::Document;

// Open any Word document (.doc or .docx) - format auto-detected
let doc = Document::open("document.doc")?;

// Extract all text
let text = doc.text()?;
println!("Document text: {}", text);

// Access paragraphs
for para in doc.paragraphs()? {
    println!("Paragraph: {}", para.text()?);
     
    // Access runs with formatting
    for run in para.runs()? {
        println!("  Text: {}", run.text()?);
        if run.bold()? == Some(true) {
            println!("    (bold)");
        }
    }
}

// Access tables
for table in doc.tables()? {
    println!("Table with {} rows", table.row_count()?);
    for row in table.rows()? {
        for cell in row.cells()? {
            println!("  Cell: {}", cell.text()?);
        }
    }
}

§Quick Start - PowerPoint Presentations

use litchi::Presentation;

// Open any PowerPoint presentation (.ppt or .pptx) - format auto-detected
let pres = Presentation::open("presentation.ppt")?;

// Extract all text
let text = pres.text()?;
println!("Presentation text: {}", text);

// Get slide count
println!("Total slides: {}", pres.slide_count()?);

// Access individual slides
for (i, slide) in pres.slides()?.iter().enumerate() {
    println!("Slide {}: {}", i + 1, slide.text()?);
}

§Architecture

The library is organized into several layers:

  • Document - Unified Word document interface (.doc and .docx)
  • Presentation - Unified PowerPoint interface (.ppt and .pptx)

These automatically detect file formats and provide a consistent API.

§Common Types

§Low-Level Modules (Advanced Use)

  • ole - Direct access to OLE2 format parsers
  • ooxml - Direct access to OOXML format parsers

Most users should use the high-level API and only access low-level modules when format-specific features are needed.

Re-exports§

pub use common::Error;
pub use common::Result;
pub use document::Document;
pub use presentation::Presentation;
pub use common::Length;
pub use common::RGBColor;
pub use common::PlaceholderType;
pub use common::ShapeType;
pub use common::FileFormat;
pub use common::detect_file_format;
pub use common::detect_file_format_from_bytes;

Modules§

common
Common types, traits, and utilities shared across formats
document
Unified Word document API
markdown
Markdown conversion module
ole
OLE2 format parser (legacy .doc, .ppt files)
ooxml
OOXML format parser (modern .docx, .pptx files)
presentation
Unified PowerPoint presentation API
sheet
Unified Excel spreadsheet API (placeholder for future functionality)