Crate edgeparse_core

Expand description

EdgeParse Core Library

High-performance PDF-to-structured-data extraction engine. Implements a 20-stage processing pipeline for extracting text, tables, images, and semantic structure from PDF documents.

Modules§

api: API module — Configuration and public types.
hybrid: Hybrid extraction mode — combines rule-based pipeline with LLM-assisted post-processing for higher-accuracy structured output.
models: Data model types for EdgeParse.
output: Output generators — TOC, JSON, Markdown, HTML, Text, CSV, Annotated PDF.
pdf: PDF loading layer — document loading, text extraction, line extraction.
pipeline: 20-stage processing pipeline orchestrator.
tagged: Tagged PDF processor.
utils: Shared utility functions.

Enums§

EdgePdfError: Top-level error type for EdgeParse operations.

Functions§

convert: Main entry point: convert a PDF file to structured data.
convert_bytes: Convert a PDF from an in-memory byte slice to structured data.

Crate edgeparse_core

Crate edgeparse_core Copy item path

Modules§

Enums§

Functions§

Crate edgeparse_core