Expand description
Document parsing library for extracting text from various file formats.
This crate provides functionality for parsing and extracting text content from different file formats including PDFs, Office documents (DOCX, XLSX, PPTX), text files, and images (using OCR).
§Features
- Automatic file format detection based on content
- Support for various document types:
- PDF documents
- Microsoft Office formats (DOCX, XLSX, PPTX)
- Plain text and structured text (TXT, CSV, JSON)
- Images with text content via OCR (PNG, JPEG, WebP)
- Memory-efficient processing with minimal temporary file usage
- Consolidated error handling with descriptive error messages
§Examples
use parser_core::parse;
use std::fs;
// Read a file
let data = fs::read("document.pdf")?;
// Parse it to extract text
let text = parse(&data)?;
println!("{}", text);
Enums§
- Parser
Error - Custom error type for the parser library.
Functions§
- parse
- Parses the given data into plain text.