oar-ocr-core 0.6.1

Core types and predictors for oar-ocr
Documentation

oar-ocr-core

Foundational types, model abstractions, and task-specific predictors for the OAR OCR library.

oar-ocr-core is the engine room of the OAR OCR ecosystem. It provides the core traits and implementations for high-performance OCR pipelines, featuring ONNX-based inference, specialized image processing, and a decoupled architecture designed for extensibility and speed.

Architecture

This crate implements a three-layer architecture to ensure modularity and maintainability:

  1. Models: Low-level wrappers around ONNX Runtime sessions, handling raw tensor input/output.
  2. Adapters: Traits and implementations that bridge raw model outputs to domain-specific types, handling pre- and post-processing logic.
  3. Tasks: Semantic contracts that define what a predictor does (e.g., "Text Detection"), ensuring consistent APIs across different model implementations.

Installation

Add oar-ocr-core to your project:

cargo add oar-ocr-core

Feature Flags

Feature Description
cuda Enable NVIDIA CUDA execution provider
tensorrt Enable NVIDIA TensorRT execution provider
directml Enable DirectML execution provider (Windows)
coreml Enable Core ML execution provider (macOS/iOS)
openvino Enable Intel OpenVINO execution provider
webgpu Enable WebGPU execution provider
visualization Enable drawing utilities for debugging
download-binaries Automatically download ONNX Runtime binaries (default)

Quick Start

Text Detection

Detect text regions in an image using a DBNet-based model:

use oar_ocr_core::predictors::TextDetectionPredictor;
use oar_ocr_core::utils::load_image;

// 1. Initialize the predictor
let predictor = TextDetectionPredictor::builder()
    .build("pp-ocrv5_mobile_det.onnx")?;

// 2. Load and process
let image = load_image("document.jpg")?;
let results = predictor.predict(vec![image])?;

// 3. Access results (detections for the first image)
for det in &results.detections[0] {
    println!("Box: {:?}, Score: {:.2}", det.bbox, det.score);
}

Text Recognition

Recognize text from cropped image regions:

use oar_ocr_core::predictors::TextRecognitionPredictor;
use oar_ocr_core::utils::load_image;

let predictor = TextRecognitionPredictor::builder()
    .dict_path("ppocrv5_dict.txt")
    .build("pp-ocrv5_mobile_rec.onnx")?;

let image = load_image("text_line.jpg")?;
let results = predictor.predict(vec![image])?;

// Recognition returns results per input image
for (text, score) in results.texts.iter().zip(&results.scores) {
    println!("Text: {}, Confidence: {:.2}", text, score);
}

Layout Analysis

Analyze the structure of a document to identify titles, tables, and figures:

use oar_ocr_core::predictors::LayoutDetectionPredictor;
use oar_ocr_core::domain::LayoutDetectionConfig;
use oar_ocr_core::utils::load_image;

let predictor = LayoutDetectionPredictor::builder()
    .model_name("pp-doclayoutv2")
    .with_config(LayoutDetectionConfig::with_pp_doclayoutv2_defaults())
    .build("pp-doclayoutv2.onnx")?;

let image = load_image("page.jpg")?;
let results = predictor.predict(vec![image])?;

for element in &results.elements[0] {
    println!("Type: {}, Score: {:.2}", element.element_type, element.score);
}

Available Predictors

Predictor Description
TextDetectionPredictor Locates text regions (polygons) in images.
TextRecognitionPredictor Converts text regions into strings.
LayoutDetectionPredictor Identifies semantic elements (Title, Table, Figure).
TableStructureRecognitionPredictor Extracts HTML/Markdown structure from tables.
TableCellDetectionPredictor Locates individual cells within a table.
FormulaRecognitionPredictor Converts math formulas to LaTeX.
DocumentOrientationPredictor Detects and corrects document rotation.
DocumentRectificationPredictor Unwarps perspective or curved document images.
SealTextDetectionPredictor Specialized detection for curved official stamps.