xlsxzero
Pure-Rust Excel parser and Markdown converter for RAG systems.
Overview
xlsxzero is a high-performance, memory-efficient Rust crate designed to parse Excel files (XLSX format) and convert them into structured Markdown format. It is optimized for RAG (Retrieval-Augmented Generation) systems that require efficient processing of large Excel files.
Features
- Pure Rust Implementation: No dependencies on C/C++ libraries
- Streaming Architecture: Process large Excel files with minimal memory footprint
- Structured Markdown Output: Convert Excel tables to GitHub Flavored Markdown
- Cell Merging Support: Handle merged cells with multiple strategies
- Date/Time Conversion: Accurate conversion of Excel serial dates
- Formula Support: Extract cached values or formula strings
Installation
Add this to your Cargo.toml:
[]
= "0.1.0"
Quick Start
Basic Usage
use File;
use ConverterBuilder;
Custom Configuration
use File;
use ;
Convert to String
use File;
use ConverterBuilder;
Examples
The repository includes several example programs demonstrating different use cases:
- Basic Conversion (
examples/basic_conversion.rs): Simple file-to-file conversion - Custom Configuration (
examples/custom_config.rs): Using advanced configuration options - CLI Tool (
examples/cli_tool.rs): Building a command-line tool
Run an example:
Status
This project is currently in early development. Phase I features are being implemented.
Documentation
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines and the issue list for development tasks.
API Documentation
Full API documentation is available at docs.rs/xlsxzero (when published) or by running: