Crate fleischwolf_pdf

Expand description

PDF backend for fleischwolf.

A port of docling’s standard PDF pipeline: pdfium extracts the text layer (cells with bounding boxes) and renders page images; a discriminative ONNX stack (layout detection, table structure, OCR) classifies regions; the cells are assembled in reading order into a DoclingDocument.

Current stages: pdfium text-cell extraction + page rendering ([pdfium_backend]) and the deterministic text/reading-order assembly ([assemble]). The layout, table-structure and OCR ONNX stages land behind Pipeline next.

Modules§

layout: Layout detection via the RT-DETR (docling-layout-heron) model exported to ONNX, run with ort. A port of docling-ibm-models’ LayoutPredictor: resize the page image to 640×640 and rescale to [0,1] (the heron processor has do_normalize=false), run the model, then RT-DETR post_process_object_detection (sigmoid → top-k over query×class → center-to-corners boxes scaled to the page).

Structs§

PdfDocument: A parsed PDF: per-page text cells and page images.
PdfPage: One page’s geometry, extracted text cells, and a rendered RGB image. The image is rendered at [RENDER_SCALE] pixels per PDF point; image px = page point × scale.
Pipeline: A reusable PDF pipeline: the layout model is loaded once and reused across documents; OCR loads lazily the first time a scanned page is seen.
TextCell: A run of text with its bounding box, in PDF points with a top-left origin (pdfium’s native origin is bottom-left; we flip it to match docling’s BoundingBox(..., origin=TOPLEFT)).

Enums§

PdfError: Errors from the PDF backend. Detailed and surfaced (never silently skipped).

Functions§

convert: Convenience one-shot conversion (loads the pipeline per call). Errors are detailed and surfaced (never silently skipped).
convert_image: Convenience one-shot image conversion (loads the pipeline per call).
convert_mets_gbs
convert_pages: Convert pre-segmented pages (image + already-known text cells, e.g. METS/hOCR scans) through the shared layout + assembly pipeline.