marque_extract/lib.rs
1//! marque-extract — document text and metadata extraction.
2//!
3//! Wraps Kreuzberg (https://github.com/kreuzberg-dev/kreuzberg):
4//! Rust-core, SIMD-optimized, streaming, 75+ formats, OCR for scanned documents.
5//!
6//! NOT included in the marque-wasm build. In WASM context, the calling application
7//! is responsible for providing pre-extracted text to the engine.
8//!
9//! # Metadata
10//! Metadata extraction runs in the same pipeline pass as text extraction.
11//! Metadata issues are surfaced as `MetadataWarning` — always reported,
12//! stripping is opt-in via `ExtractionOptions::strip_metadata`.
13
14pub mod extractor;
15pub mod metadata;
16
17pub use extractor::{ExtractedDocument, ExtractionOptions, Extractor};
18pub use metadata::{MetadataField, MetadataReport, MetadataWarning};