Skip to main content

Crate edgeparse_core

Crate edgeparse_core 

Source
Expand description

EdgeParse Core Library

High-performance PDF-to-structured-data extraction engine. Implements a 20-stage processing pipeline for extracting text, tables, images, and semantic structure from PDF documents.

Modules§

api
API module — Configuration and public types.
hybrid
Hybrid extraction mode — combines rule-based pipeline with LLM-assisted post-processing for higher-accuracy structured output.
models
Data model types for EdgeParse.
output
Output generators — TOC, JSON, Markdown, HTML, Text, CSV, Annotated PDF.
pdf
PDF loading layer — document loading, text extraction, line extraction.
pipeline
20-stage processing pipeline orchestrator.
tagged
Tagged PDF processor.
utils
Shared utility functions.

Enums§

EdgePdfError
Top-level error type for EdgeParse operations.

Functions§

convert
Main entry point: convert a PDF file to structured data.
convert_bytes
Convert a PDF from an in-memory byte slice to structured data.