edgeparse-cli-0.2.3 is not a library.
edgeparse
High-performance PDF-to-structured-data extraction CLI.
Convert PDF documents to Markdown, JSON, HTML, or plain text with a single
command. Built on top of edgeparse-core.
Installation
Usage
# Convert to Markdown (default format is JSON)
# Convert to multiple formats
# Extract specific pages
# Use XY-Cut reading order (enabled by default)
# Extract with cluster-based table detection
# Extract images externally
Features
- Multiple output formats — JSON, Markdown, HTML, plain text, DOCX, CSV
- Table detection — border-based and cluster detection methods
- Reading order — XY-Cut++ algorithm for correct multi-column reading order
- Image extraction — embedded base64 or external file output (PNG/JPEG)
- Content safety — filters hidden text, off-page content, watermarks
- Encrypted PDFs — password-based decryption support
- Tagged PDFs — uses PDF structure tree when available
- PII sanitization — optional personal data redaction
License
Apache-2.0