Skip to main content

parse

Function parse 

Source
pub fn parse(data: &[u8], fmt: Option<Format>) -> Result<Document>
Expand description

Parse data into the unified ir::Document.

fmt forces a format; None auto-detects from magic bytes. The core makes no I/O assumptions — it only borrows &[u8], so each binding feeds it bytes however it likes (Python bytes, Node Buffer, WASM Uint8Array).

M0 uses lopdf’s naive text extraction (one paragraph per page, no per-char coordinates). PER-36 replaces the PDF path with the self-written content-stream interpreter that fills ir::Page::chars with precise bboxes.