Expand description
§Overview
A parser for the hOCR format, “an open standard for representing document layout analysis and OCR results as a subset of HTML.”
§Design
This parser uses roxmltree
to parse the XHTML. It simplifies provides easy access to the hOCR data embedded through the HOCR
and Element
structs, as well as their “borrowed” counterparts (HOCRBorrowed
, ElementBorrowed
) to prevent allocating for property names.
The parser does not validate if the file adheres to the hOCR specification. It checks required metadata and validity of hOCR element and property names but does not check property values.
Re-exports§
pub use roxmltree;
Modules§
- spec_
definitions - Contains the element and property names defined in the hOCR specification.
Structs§
- Element
- Represents an hOCR element.
- Element
Borrowed - Represents an hOCR element, borrowing its contents from the XML string.
- HOCR
- Represents a hOCR file.
- HOCR
Borrowed - Represents a hOCR file, borrowing its contents from the XML string.
Enums§
- HOCR
Parser Error - hOCR parsing error variants.
Type Aliases§
- Result
- A
Result
type alias usingHOCRParserError
instances as the error variant.