Skip to main content

Module parser

Module parser 

Source

Structs§

FontInfo
Font metadata resolved from a PDF font dictionary.
RawTextSegment
A single text segment extracted from a PDF content stream.

Functions§

extract_text_segments_for_page
Extract raw text segments from a page’s content stream.
load_pdf
Load PDF bytes into a lopdf Document, mapping all failures to warnings.
parse_pdf
End-to-end PDF parsing: load, extract metadata, resolve fonts, extract text segments.
resolve_fonts_for_page
Resolve font dictionaries for a given page.