Skip to main content

Module layout

Module layout 

Source
Expand description

Text layout: words → lines → blocks, reading order, text output.

Structs§

TextBlock
A text block: a group of lines forming a coherent paragraph or section.
TextLine
A text line: a sequence of words on the same y-level.
TextOptions
Options for layout-aware text extraction.

Functions§

blocks_to_text
Convert text blocks into a string.
cluster_lines_into_blocks
Cluster text line segments into text blocks based on x-overlap and vertical proximity.
cluster_words_into_lines
Cluster words into text lines based on y-proximity.
sort_blocks_reading_order
Sort text blocks in natural reading order.
split_lines_at_columns
Split text lines at large horizontal gaps to detect column boundaries.
words_to_text
Simple (non-layout) text extraction from words.