Expand description
Text extraction with character-level position tracking.
Parses content stream text operators (Tj, TJ, Tm, Td, TD, T*, Tc, Tw, Tz, TL, Ts, ’, “) to extract text with positional information.
Structs§
- Positioned
Char - A single character with its position on the page.
- Text
Block - A block of text extracted from a page.
Enums§
- Width
Source - Whether a text span’s width was computed from real font metrics or estimated.
Functions§
- extract_
blocks_ from_ page_ id - Extract text blocks from a page identified by its object ID.
- extract_
page_ blocks - Extract text blocks from a specific page.
- extract_
page_ text - Extract text from a specific page as a plain string.
- extract_
positioned_ chars - Extract positioned characters from a specific page.
- extract_
text - Extract text blocks from all pages of a document.