Crate parser_core

Source
Expand description

Document parsing library for extracting text from various file formats.

This crate provides functionality for parsing and extracting text content from different file formats including PDFs, Office documents (DOCX, XLSX, PPTX), text files, and images (using OCR).

§Features

  • Automatic file format detection based on content
  • Support for various document types:
    • PDF documents
    • Microsoft Office formats (DOCX, XLSX, PPTX)
    • Plain text and structured text (TXT, CSV, JSON)
    • Images with text content via OCR (PNG, JPEG, WebP)
  • Memory-efficient processing with minimal temporary file usage
  • Consolidated error handling with descriptive error messages

§Examples

use parser_core::parse;
use std::fs;

// Read a file
let data = fs::read("document.pdf")?;

// Parse it to extract text
let text = parse(&data)?;
println!("{}", text);

Enums§

ParserError
Custom error type for the parser library.

Functions§

parse
Parses the given data into plain text.