Expand description
Litchi - High-performance Rust library for Microsoft Office file formats
Litchi provides a unified, user-friendly API for parsing Microsoft Office documents in both legacy (OLE2) and modern (OOXML) formats. The library automatically detects file formats and provides consistent interfaces for working with documents and presentations.
§Features
- Unified API: Work with .doc and .docx files using the same interface
- Format Auto-detection: No need to specify file format - it’s detected automatically
- High Performance: Zero-copy parsing with SIMD optimizations where possible
- Production Ready: Clean API inspired by python-docx and python-pptx
- Type Safe: Leverages Rust’s type system for safety and correctness
§Quick Start - Word Documents
use litchi::Document;
// Open any Word document (.doc or .docx) - format auto-detected
let doc = Document::open("document.doc")?;
// Extract all text
let text = doc.text()?;
println!("Document text: {}", text);
// Access paragraphs
for para in doc.paragraphs()? {
println!("Paragraph: {}", para.text()?);
// Access runs with formatting
for run in para.runs()? {
println!(" Text: {}", run.text()?);
if run.bold()? == Some(true) {
println!(" (bold)");
}
}
}
// Access tables
for table in doc.tables()? {
println!("Table with {} rows", table.row_count()?);
for row in table.rows()? {
for cell in row.cells()? {
println!(" Cell: {}", cell.text()?);
}
}
}§Quick Start - PowerPoint Presentations
use litchi::Presentation;
// Open any PowerPoint presentation (.ppt or .pptx) - format auto-detected
let pres = Presentation::open("presentation.ppt")?;
// Extract all text
let text = pres.text()?;
println!("Presentation text: {}", text);
// Get slide count
println!("Total slides: {}", pres.slide_count()?);
// Access individual slides
for (i, slide) in pres.slides()?.iter().enumerate() {
println!("Slide {}: {}", i + 1, slide.text()?);
}§Architecture
The library is organized into several layers:
§High-Level API (Recommended)
Document- Unified Word document interface (.doc and .docx)Presentation- Unified PowerPoint interface (.ppt and .pptx)
These automatically detect file formats and provide a consistent API.
§Common Types
common::Error- Unified error typecommon::Result- Result type aliascommon::ShapeType- Common shape typescommon::RGBColor- Color representationcommon::Length- Measurement with units
§Low-Level Modules (Advanced Use)
Most users should use the high-level API and only access low-level modules when format-specific features are needed.
Re-exports§
pub use common::Error;pub use common::Result;pub use document::Document;pub use presentation::Presentation;pub use common::Length;pub use common::RGBColor;pub use common::PlaceholderType;pub use common::ShapeType;pub use common::FileFormat;pub use common::detect_file_format;pub use common::detect_file_format_from_bytes;
Modules§
- common
- Common types, traits, and utilities shared across formats
- document
- Unified Word document API
- markdown
- Markdown conversion module
- ole
- OLE2 format parser (legacy .doc, .ppt files)
- ooxml
- OOXML format parser (modern .docx, .pptx files)
- presentation
- Unified PowerPoint presentation API
- sheet
- Unified Excel spreadsheet API (placeholder for future functionality)