Expand description
Spatial table reconstruction from hOCR bounding box coordinates
This module provides functions to detect and reconstruct tabular data from OCR’d text by analyzing the spatial positions of words using their bounding box (bbox) coordinates.
Structs§
- Hocr
Word - Represents a word extracted from hOCR with position and confidence information
Functions§
- detect_
columns - Detect column positions from word positions
- detect_
rows - Detect row positions from word positions
- extract_
hocr_ words - Extract hOCR words from a DOM tree
- reconstruct_
table - Reconstruct table structure from words
- table_
to_ markdown - Convert table to markdown format