marque-extract — document text and metadata extraction.
Wraps Kreuzberg (https://github.com/kreuzberg-dev/kreuzberg): Rust-core, SIMD-optimized, streaming, 75+ formats, OCR for scanned documents.
NOT included in the marque-wasm build. In WASM context, the calling application is responsible for providing pre-extracted text to the engine.
Metadata
Metadata extraction runs in the same pipeline pass as text extraction.
Metadata issues are surfaced as MetadataWarning — always reported,
stripping is opt-in via ExtractionOptions::strip_metadata.