oar-ocr-core
Foundational types, model abstractions, and task-specific predictors for the OAR OCR library.
oar-ocr-core is the engine room of the OAR OCR ecosystem. It provides the core traits and implementations for high-performance OCR pipelines, featuring ONNX-based inference, specialized image processing, and a decoupled architecture designed for extensibility and speed.
Architecture
This crate implements a three-layer architecture to ensure modularity and maintainability:
- Models: Low-level wrappers around ONNX Runtime sessions, handling raw tensor input/output.
- Adapters: Traits and implementations that bridge raw model outputs to domain-specific types, handling pre- and post-processing logic.
- Tasks: Semantic contracts that define what a predictor does (e.g., "Text Detection"), ensuring consistent APIs across different model implementations.
Installation
Add oar-ocr-core to your project:
Feature Flags
| Feature | Description |
|---|---|
cuda |
Enable NVIDIA CUDA execution provider |
tensorrt |
Enable NVIDIA TensorRT execution provider |
directml |
Enable DirectML execution provider (Windows) |
coreml |
Enable Core ML execution provider (macOS/iOS) |
openvino |
Enable Intel OpenVINO execution provider |
webgpu |
Enable WebGPU execution provider |
visualization |
Enable drawing utilities for debugging |
download-binaries |
Automatically download ONNX Runtime binaries (default) |
Quick Start
Text Detection
Detect text regions in an image using a DBNet-based model:
use TextDetectionPredictor;
use load_image;
// 1. Initialize the predictor
let predictor = builder
.build?;
// 2. Load and process
let image = load_image?;
let results = predictor.predict?;
// 3. Access results (detections for the first image)
for det in &results.detections
Text Recognition
Recognize text from cropped image regions:
use TextRecognitionPredictor;
use load_image;
let predictor = builder
.dict_path
.build?;
let image = load_image?;
let results = predictor.predict?;
// Recognition returns results per input image
for in results.texts.iter.zip
Layout Analysis
Analyze the structure of a document to identify titles, tables, and figures:
use LayoutDetectionPredictor;
use LayoutDetectionConfig;
use load_image;
let predictor = builder
.model_name
.with_config
.build?;
let image = load_image?;
let results = predictor.predict?;
for element in &results.elements
Available Predictors
| Predictor | Description |
|---|---|
TextDetectionPredictor |
Locates text regions (polygons) in images. |
TextRecognitionPredictor |
Converts text regions into strings. |
LayoutDetectionPredictor |
Identifies semantic elements (Title, Table, Figure). |
TableStructureRecognitionPredictor |
Extracts HTML/Markdown structure from tables. |
TableCellDetectionPredictor |
Locates individual cells within a table. |
FormulaRecognitionPredictor |
Converts math formulas to LaTeX. |
DocumentOrientationPredictor |
Detects and corrects document rotation. |
DocumentRectificationPredictor |
Unwarps perspective or curved document images. |
SealTextDetectionPredictor |
Specialized detection for curved official stamps. |