marque-extract 0.1.0

Document text and metadata extraction via Kreuzberg (75+ formats, OCR)
Documentation

marque-extract — document text and metadata extraction.

Wraps Kreuzberg (https://github.com/kreuzberg-dev/kreuzberg): Rust-core, SIMD-optimized, streaming, 75+ formats, OCR for scanned documents.

NOT included in the marque-wasm build. In WASM context, the calling application is responsible for providing pre-extracted text to the engine.

Metadata

Metadata extraction runs in the same pipeline pass as text extraction. Metadata issues are surfaced as MetadataWarning — always reported, stripping is opt-in via ExtractionOptions::strip_metadata.