Expand description
hOCR 1.2 document processing.
Complete hOCR 1.2 specification support for extracting structured content from OCR documents.
§Features
- Full Element Support: All 40+ hOCR 1.2 element types
- Complete Property Parsing: All 20+ hOCR properties (bbox, baseline, fonts, etc.)
- Document Structure: Logical hierarchy (paragraphs, sections, chapters)
- Spatial Table Reconstruction: Automatic table detection from bbox coordinates
- Metadata Extraction: OCR system info, capabilities, languages
§Modules
Re-exports§
pub use converter::convert_to_markdown;pub use converter::convert_to_markdown_with_options;pub use extractor::extract_hocr_document;pub use spatial::extract_hocr_words;pub use spatial::reconstruct_table;pub use spatial::table_to_markdown;pub use spatial::HocrWord;pub use types::BBox;pub use types::Baseline;pub use types::HocrElement;pub use types::HocrElementType;pub use types::HocrMetadata;pub use types::HocrProperties;