Skip to main content

Crate marque_extract

Crate marque_extract 

Source
Expand description

marque-extract — document text and metadata extraction.

Wraps Kreuzberg (https://github.com/kreuzberg-dev/kreuzberg): Rust-core, SIMD-optimized, streaming, 75+ formats, OCR for scanned documents.

NOT included in the marque-wasm build. In WASM context, the calling application is responsible for providing pre-extracted text to the engine.

§Metadata

Metadata extraction runs in the same pipeline pass as text extraction. Metadata issues are surfaced as MetadataWarning — always reported, stripping is opt-in via ExtractionOptions::strip_metadata.

Re-exports§

pub use extractor::ExtractedDocument;
pub use extractor::ExtractionOptions;
pub use extractor::Extractor;
pub use metadata::MetadataField;
pub use metadata::MetadataReport;
pub use metadata::MetadataWarning;

Modules§

extractor
Document text extraction with streaming support.
metadata
Metadata extraction and sanitization.