Expand description
EdgeParse Core Library
High-performance PDF-to-structured-data extraction engine. Implements a 20-stage processing pipeline for extracting text, tables, images, and semantic structure from PDF documents.
Modules§
- api
- API module — Configuration and public types.
- hybrid
- Hybrid extraction mode — combines rule-based pipeline with LLM-assisted post-processing for higher-accuracy structured output.
- models
- Data model types for EdgeParse.
- output
- Output generators — TOC, JSON, Markdown, HTML, Text, CSV, Annotated PDF.
- PDF loading layer — document loading, text extraction, line extraction.
- pipeline
- 20-stage processing pipeline orchestrator.
- tagged
- Tagged PDF processor.
- utils
- Shared utility functions.
Enums§
- Edge
PdfError - Top-level error type for EdgeParse operations.
Functions§
- convert
- Main entry point: convert a PDF file to structured data.
- convert_
bytes - Convert a PDF from an in-memory byte slice to structured data.