Module spatial

Module spatial 

Source
Expand description

Spatial table reconstruction from hOCR bounding box coordinates

This module provides functions to detect and reconstruct tabular data from OCR’d text by analyzing the spatial positions of words using their bounding box (bbox) coordinates.

Structs§

HocrWord
Represents a word extracted from hOCR with position and confidence information

Functions§

detect_columns
Detect column positions from word positions
detect_rows
Detect row positions from word positions
extract_hocr_words
Extract hOCR words from a DOM tree
reconstruct_table
Reconstruct table structure from words
table_to_markdown
Convert table to markdown format