docs.rs failed to build edgeparse-core-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
edgeparse-core-0.2.3
edgeparse-core
High-performance PDF-to-structured-data extraction engine.
edgeparse-core implements a 20-stage processing pipeline that extracts text,
tables, images, and semantic structure from PDF documents and produces
structured output in Markdown, JSON, HTML, or plain text.
Usage
use ;
use Path;
let config = default;
let doc = convert?;
println!;
for element in &doc.kids
Output Formats
Generate output in multiple formats using the output modules:
use output;
let markdown = to_markdown?;
let json = to_legacy_json_string?;
let html = to_html?;
let text = to_text?;
Features
- Tagged PDF support — uses PDF structure tree for semantic extraction
- Table detection — border-based and cluster detection methods
- Reading order — XY-Cut++ algorithm for correct reading order
- Image extraction — embedded or external image output
- Content safety — filters hidden text, off-page content, tiny text
- PII sanitization — optional personal data redaction
- Multi-column layout — automatic column detection and ordering
Feature Flags
| Flag | Description |
|---|---|
hybrid |
Enable Docling backend integration (requires tokio + reqwest) |
License
Apache-2.0