marque-extract 0.2.0

Document text and metadata extraction via Kreuzberg (75+ formats, OCR)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
//! marque-extract — document text and metadata extraction.
//!
//! Wraps Kreuzberg (https://github.com/kreuzberg-dev/kreuzberg):
//! Rust-core, SIMD-optimized, streaming, 75+ formats, OCR for scanned documents.
//!
//! NOT included in the marque-wasm build. In WASM context, the calling application
//! is responsible for providing pre-extracted text to the engine.
//!
//! # Metadata
//! Metadata extraction runs in the same pipeline pass as text extraction.
//! Metadata issues are surfaced as `MetadataWarning` — always reported,
//! stripping is opt-in via `ExtractionOptions::strip_metadata`.

pub mod extractor;
pub mod metadata;

pub use extractor::{ExtractedDocument, ExtractionOptions, Extractor};
pub use metadata::{MetadataField, MetadataReport, MetadataWarning};