tesseract-ocr-static 0.1.2

This crate provides ergonomic Rust interface for underlying Tesseract OCR library. There are two main structs: [TextRecognizer] and [LayoutAnalyzer]. TextRecognizer allows one to recognize text from the picture and outputs the text, the bounding boxes and other parameters. LayoutAnalyzer allows one to analyze the layout without recognizing text; it should consume less memory than TextRecognizer.

Pictures can be loaded into the library via [Image] struct that accepts images in raw RGB/RGBA formats; no other formats are supported. If you need to read an image from a file in another format, you can do so with any Rust crate (e.g. image).

Examples

Simple character recognition

use tesseract_ocr_static::{Image, TextRecognizer};
use image::ImageReader;

let rgb = ImageReader::open("hello.txt").unwrap().decode().unwrap().into_rgb8();
let image = Image::from_rgb(rgb.width(), rgb.height(), rgb.as_raw()).unwrap();
let mut recognizer = TextRecognizer::new().unwrap();
let results = recognizer.recognize_text(&image).unwrap();
assert_eq!("Hello world", results.get_utf8_text().as_str());

Print recognized text and corresponding bounding boxes

use tesseract_ocr_static::{Image, LayoutLevel, TextRecognizer};
use image::ImageReader;

let rgb = ImageReader::open("hello.txt").unwrap().decode().unwrap().into_rgb8();
let image = Image::from_rgb(rgb.width(), rgb.height(), rgb.as_raw()).unwrap();
let mut recognizer = TextRecognizer::new().unwrap();
let results = recognizer.recognize_text(&image).unwrap();
let mut iter = results.iter();
while let Some(word) = iter.next(LayoutLevel::Word) {
    println!(
        "Word {:?}, confidence {:.1}, bounding box {:?}",
        word.get_utf8_text(LayoutLevel::Word).as_str(),
        word.confidence(LayoutLevel::Word),
        word.bounding_box(LayoutLevel::Word),
    );
}

Build customization

Please consult tesseract-ocr-static-c crate's documentation.