Expand description
Text layout: words → lines → blocks, reading order, text output.
Structs§
- Text
Block - A text block: a group of lines forming a coherent paragraph or section.
- Text
Line - A text line: a sequence of words on the same y-level.
- Text
Options - Options for layout-aware text extraction.
Functions§
- blocks_
to_ text - Convert text blocks into a string.
- cluster_
lines_ into_ blocks - Cluster text line segments into text blocks based on x-overlap and vertical proximity.
- cluster_
words_ into_ lines - Cluster words into text lines based on y-proximity.
- sort_
blocks_ reading_ order - Sort text blocks in natural reading order.
- split_
lines_ at_ columns - Split text lines at large horizontal gaps to detect column boundaries.
- words_
to_ text - Simple (non-layout) text extraction from words.