Expand description
PDF backend for fleischwolf.
A port of docling’s standard PDF pipeline: pdfium extracts the text layer
(cells with bounding boxes) and renders page images; a discriminative ONNX
stack (layout detection, table structure, OCR) classifies regions; the cells
are assembled in reading order into a DoclingDocument.
Current stages: pdfium text-cell extraction + page rendering ([pdfium_backend])
and the deterministic text/reading-order assembly ([assemble]). The layout,
table-structure and OCR ONNX stages land behind Pipeline next.
Modules§
- layout
- Layout detection via the RT-DETR (
docling-layout-heron) model exported to ONNX, run withort. A port of docling-ibm-models’LayoutPredictor: resize the page image to 640×640 and rescale to[0,1](the heron processor hasdo_normalize=false), run the model, then RT-DETRpost_process_object_detection(sigmoid → top-k over query×class → center-to-corners boxes scaled to the page).
Structs§
- PdfDocument
- A parsed PDF: per-page text cells and page images.
- PdfPage
- One page’s geometry, extracted text cells, and a rendered RGB image. The
image is rendered at [
RENDER_SCALE] pixels per PDF point;image px = page point × scale. - Pipeline
- A reusable PDF pipeline: the layout model is loaded once and reused across documents; OCR loads lazily the first time a scanned page is seen.
- Text
Cell - A run of text with its bounding box, in PDF points with a top-left origin
(pdfium’s native origin is bottom-left; we flip it to match docling’s
BoundingBox(..., origin=TOPLEFT)).
Enums§
- PdfError
- Errors from the PDF backend. Detailed and surfaced (never silently skipped).
Functions§
- convert
- Convenience one-shot conversion (loads the pipeline per call). Errors are detailed and surfaced (never silently skipped).
- convert_
image - Convenience one-shot image conversion (loads the pipeline per call).
- convert_
mets_ gbs - convert_
pages - Convert pre-segmented pages (image + already-known text cells, e.g. METS/hOCR scans) through the shared layout + assembly pipeline.