Litchi
A high-performance Rust library for parsing Microsoft Office file formats (OLE2 and OOXML), OpenDocument formats (ODF), and Apple iWork files. Supports .doc, .docx, .xls, .xlsx, .ppt, .pptx, .odt, .ods, .odp, .pages, .numbers, and .key files.
[!WARNING] ⚠️ Active Development: This library is under active development. The API may change without notice. Not recommended for production use yet.
[!NOTE] The current logo is generated by AI, we need someone to design a better logo. If you are interested, please contact me via email.
Features
- Unified API - Same interface for legacy and modern formats with automatic format detection
- Microsoft Office - Parse .doc, .docx, .xls, .xlsx, .xlsb, .ppt, .pptx files
- OpenDocument - Parse .odt, .ods, .odp files (ODF format)
- Apple iWork - Parse .pages, .numbers, .key files (IWA format)
- Formula Conversion - Parse MathType and Office MathML equations and convert to LaTeX
- Markdown Conversion - Convert documents and presentations to Markdown format
- Memory Efficient - Direct byte buffer support with zero-copy parsing where possible
- High Performance - SIMD optimizations, minimal allocations, efficient memory layout
Quick Start
use ;
// Microsoft Office formats - format auto-detected
let doc = open?; // .doc or .docx
let text = doc.text?;
let pres = open?; // .ppt or .pptx
let slide_count = pres.slide_count?;
// Excel spreadsheets
use open_xls_workbook;
let workbook = open_xls_workbook?;
let worksheet = workbook.worksheet_by_name?;
// OpenDocument formats (requires "odf" feature)
// Apple iWork formats (requires "iwa" feature)
// Formula conversion (requires "formula" feature)
Installation
Add to your Cargo.toml:
[]
= "0.0.1"
Optional Features
By default, only Microsoft Office formats are enabled (ole and ooxml features). Enable additional features as needed:
[]
# Enable all features
= { = "0.0.1", = ["odf", "iwa", "formula", "imgconv"] }
# Or enable specific features
= { = "0.0.1", = ["odf"] } # OpenDocument support
= { = "0.0.1", = ["iwa"] } # Apple iWork support
= { = "0.0.1", = ["formula"] } # Formula parsing and LaTeX conversion
= { = "0.0.1", = ["imgconv"] } # Image conversion support
Available Features:
ole(default) - Legacy Office formats (.doc, .xls, .ppt)ooxml(default) - Modern Office formats (.docx, .xlsx, .pptx)odf- OpenDocument formats (.odt, .ods, .odp)iwa- Apple iWork formats (.pages, .numbers, .key)formula- MathType and Office MathML to LaTeX conversionimgconv- Image format conversion (EMF, WMF, PICT to PNG/JPEG/WebP)
Documentation
For detailed documentation, API reference, and examples, visit docs.rs/litchi.
Current Status
Implemented:
- ✅ Microsoft Office formats (.doc, .docx, .xls, .xlsx, .xlsb, .ppt, .pptx)
- ✅ OpenDocument formats (.odt, .ods, .odp)
- ✅ Apple iWork formats (.pages, .numbers, .key)
- ✅ Text extraction and basic formatting
- ✅ Table and shape parsing
- ✅ Formula parsing (MathType and MathML to LaTeX)
- ✅ Markdown conversion
Limitations:
- Read-only (no document creation or modification)
- Basic formatting support only
- No formula evaluation, charts, headers/footers, embedded objects
- Missing advanced features like styles, themes, comments, revisions
See docs.rs/litchi for the complete roadmap and planned features.
License
Licensed under the Apache License, Version 2.0.
Acknowledgments
This library is built upon the work of many open-source projects. We are grateful to the following projects for their research, documentation, and reference implementations:
Microsoft Office Formats:
- Apache POI - Java library for Microsoft Office formats
- python-docx - Python library for DOCX files
- python-pptx - Python library for PPTX files
- openpyxl - Python library for XLSX files
- calamine - Rust Excel/ODS reader
- pyxlsb2 - Python XLSB parser
- xlrd - Python library for reading Excel files
OpenDocument Formats:
Apple iWork Formats:
- libetonyek - Library for Apple Keynote presentations
- iWorkFileFormat - iWork file format documentation
- pyiwa - Python iWork Archive parser
RTF Formats:
- rtf2latex2e - RTF to LaTeX converter
- RtfDomParser - RTF parser
Formula Conversion:
- plurimath - Multi-format mathematical formula converter
Image Conversion:
- libemf2svg - EMF/WMF to SVG conversion
- libwmf - Windows Metafile library
- pict2png - PICT to PNG conversion
Utilities:
Specifications:
All projects retain their original licenses.