Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
tesseract5-rs
High-level Rust OCR library built on top of tesseract-rs, the dariofinardi fork of
cafercangundogdu/tesseract-rs. Provides an ergonomic Ocr5Engine API with automatic
tessdata path resolution, word-level bounding-box hierarchy, and re-exports the full
tesseract-rs surface so callers need only one dependency.
On versioning — this crate starts at v0.1.0 because it is the first public release, not because it is incomplete or experimental. The OCR pipeline, word-level hierarchy, automatic tessdata resolution, and all re-exported
tesseract-55-rsbindings are production-ready and work correctly for their intended purpose. The 0.x prefix simply follows Rust convention for a first crates.io publication.
Why this crate exists
The dependency chain
tesseract5-rs ← you are here (high-level API)
└── tesseract-rs ← dariofinardi fork (semplifica branch)
└── Tesseract 5.5.0 + Leptonica 1.85.0 (compiled from source at build time)
Using tesseract-rs directly is perfectly valid. tesseract5-rs adds a thin ergonomic
layer on top:
tesseract-rs |
tesseract5-rs |
|
|---|---|---|
| FFI bindings | ✓ | via re-export |
| Tessdata path resolution | manual | automatic |
OcrOptions / OcrOutput structs |
✗ | ✓ |
Ocr5Engine wrapper |
✗ | ✓ |
| Hierarchy in one call | manual | with_hierarchy: true |
| crates.io publication target | no (git-only) | yes |
Why the semplifica branch of the fork
The upstream cafercangundogdu/tesseract-rs crate (v0.1.20 on crates.io) targets
Tesseract 5.3.x and does not expose word-level positional output. The fork (semplifica branch) ships the following changes on top of upstream:
| Commit | Change |
|---|---|
ab37bf8 |
Tesseract 5.5.0 + Leptonica 1.85.0 — build script bumped; newer model support and C API fixes |
e51751b |
ARM64 / Snapdragon X Elite support — correct library names and linker flags for aarch64-pc-windows-msvc |
bbb0b1d |
Per-arch build cache — %APPDATA%/tesseract-rs/<arch>/… prevents host/cross conflicts |
0819eb8 |
dynamic-libs feature — builds Tesseract + Leptonica as DLL/.so for desktop app bundling (Tauri, etc.) |
c981751 |
TesseractHierarchy + get_hierarchy() — walks the ResultIterator and returns a nested Rust struct: TesseractHierarchy → [Block → [Paragraph → [TextLine → [Word(text, bbox, confidence)]]]] with BoundingBox, serializable via serde |
da080d2 |
UB fix in process_pages() — TessBaseAPIProcessPages returns BOOL (c_int), not char *; old code cast integer 1 to a string pointer causing undefined behaviour |
None of these changes are available in the upstream crate. Until they are merged upstream
(or a compatible crates.io release is published), tesseract5-rs pins to this branch to
provide a stable, tested surface.
Installation
[]
= { = "https://github.com/dariofinardi/Tesseract5-rs" }
Note: the first build compiles Tesseract 5.5.0 and Leptonica 1.85.0 from source (~2–4 min). Subsequent builds use the cached compiled libraries in
%APPDATA%/tesseract-rs/<arch>/(Windows) or~/.tesseract-rs/<arch>/(Linux/macOS).
Optional features
| Feature | Description |
|---|---|
dynamic-libs |
Build Tesseract + Leptonica as shared libraries (.dll/.so) instead of static libs. Useful when bundling native binaries in a desktop app. |
Quick start
use ;
With word-level bounding boxes
use ;
let engine = new?;
let output = engine.recognize?;
if let Some = output.hierarchy
Custom tessdata path
use ;
use PathBuf;
let engine = new?;
The TESSDATA_PREFIX environment variable is also honoured if tessdata_dir is not set.
API overview
Ocr5Engine
OcrOptions
OcrOutput
Re-exported types
All public types from tesseract-rs are re-exported:
TesseractAPI, TesseractHierarchy, TesseractBlock, TesseractParagraph,
TesseractTextLine, TesseractWord, BoundingBox, TesseractError, Result,
TessPageSegMode, TessPageIteratorLevel, and the remaining enums and iterators.
Tessdata path resolution
default_tessdata_dir() (also public) resolves in this order:
TESSDATA_PREFIXenvironment variable- Build-cache path written by
tesseract-rs's build script:- Windows:
%APPDATA%\tesseract-rs\<arch>\static\tessdata - Linux:
~/.tesseract-rs/<arch>/static/tessdata - macOS:
~/Library/Application Support/tesseract-rs/<arch>/static/tessdata
- Windows:
System requirements
- Rust 1.83.0+
- C++ compiler (MSVC on Windows, GCC/Clang on Linux/macOS)
- CMake ≥ 3.20
- Internet connection on first build (downloads Tesseract + Leptonica source archives)
Credits
- Tesseract OCR — Apache 2.0, Google Inc.
- cafercangundogdu/tesseract-rs — original Rust FFI bindings, MIT, Cafer Can Gündoğdu
- dariofinardi/tesseract-rs (
semplificabranch) — fork with Tesseract 5.5, ARM64,TesseractHierarchy, dynamic-libs, UB fixes — Dario Finardi