docs.rs failed to build liteparse-2.0.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
LiteParse
Rust library and CLI for fast, lightweight PDF and document parsing with spatial text extraction. Runs entirely locally with zero cloud dependencies.
LiteParse is also available for Node.js/TypeScript, Python, and the browser (WASM). See the project README for all options.
Installation
Add to your Cargo.toml:
[]
= "2"
Or install the CLI:
Quick Start
use ;
async
Configuration
use ;
let config = LiteParseConfig ;
let parser = new;
Parsing from Bytes
use PdfInput;
let pdf_bytes: = read?;
let result = parser.parse_input.await?;
println!;
Custom OCR Engine
Implement the OcrEngine trait to plug in your own OCR backend:
use OcrEngine;
use Arc;
let parser = new
.with_ocr_engine;
Features
tesseract(default) — Built-in Tesseract OCR viatesseract-rs. Disable withdefault-features = falseif you don't need OCR or want to use an HTTP OCR server instead.
Supported Formats
- PDF (
.pdf) - Microsoft Office (
.docx,.xlsx,.pptx, etc.) — requires LibreOffice - OpenDocument (
.odt,.ods,.odp) — requires LibreOffice - Images (
.png,.jpg,.tiff, etc.) — requires ImageMagick
CLI
The crate also builds the lit CLI binary:
See lit --help for all options.
License
Apache-2.0