Skip to main content

Module text

Module text 

Source
Expand description

Text extraction with character-level position tracking.

Parses content stream text operators (Tj, TJ, Tm, Td, TD, T*, Tc, Tw, Tz, TL, Ts, ’, “) to extract text with positional information.

Structs§

PositionedChar
A single character with its position on the page.
TextBlock
A block of text extracted from a page.

Enums§

WidthSource
Whether a text span’s width was computed from real font metrics or estimated.

Functions§

extract_blocks_from_page_id
Extract text blocks from a page identified by its object ID.
extract_page_blocks
Extract text blocks from a specific page.
extract_page_text
Extract text from a specific page as a plain string.
extract_positioned_chars
Extract positioned characters from a specific page.
extract_text
Extract text blocks from all pages of a document.