Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
kreuzberg-tesseract
Rust bindings for Tesseract OCR with built-in compilation of Tesseract and Leptonica libraries. Provides a safe and idiomatic Rust interface to Tesseract's functionality while handling the complexity of compiling the underlying C++ libraries.
Based on the original tesseract-rs by Cafer Can Gündoğdu, this maintained version adds critical improvements for production use:
- C++17 Support: Upgraded for Tesseract 5.5.1 which requires C++17 filesystem
- Cross-Compilation: Fixed CXX compiler detection for cross-platform builds
- Architecture Validation: Validates target architecture before using cached libraries
- Windows Static Linking: Fixed MSVC static linking issues
- Build Caching: Improved caching with OUT_DIR-based cache directory
- MinGW Support: Added support for MinGW toolchains
Features
- Safe Rust bindings for Tesseract OCR
- Multiple linking options:
- Static linking (default): Built-in compilation with no runtime dependencies
- Dynamic linking: Link to system-installed libraries for faster builds
- Uses existing Tesseract training data (expects English data for tests)
- High-level Rust API for common OCR tasks
- Caching of compiled libraries for faster subsequent builds
- Support for multiple operating systems (Linux, macOS, Windows)
Installation
Static Linking (Default)
Static linking builds Tesseract and Leptonica from source and embeds them in your binary. No runtime dependencies required:
[]
= "1.0.0-rc.1"
# or explicitly:
= { = "1.0.0-rc.1", = ["static-linking"] }
Dynamic Linking
Dynamic linking uses system-installed Tesseract and Leptonica libraries. Faster builds, but requires libraries installed on the system:
[]
= { = "1.0.0-rc.1", = ["dynamic-linking"], = false }
System requirements for dynamic linking:
- Tesseract 5.x libraries installed (
libtesseract,libleptonica) - macOS:
brew install tesseract leptonica - Ubuntu/Debian:
sudo apt-get install libtesseract-dev libleptonica-dev - RHEL/CentOS/Fedora:
sudo dnf install tesseract-devel leptonica-devel - Windows: Install from Tesseract releases or vcpkg
Development Dependencies
For development and testing, you'll also need these dependencies:
[]
= "0.25.5"
System Requirements
For Static Linking (Default)
When building with static linking, the crate will compile Tesseract and Leptonica from source. You need:
- Rust 1.85.0 or later
- A C++ compiler (e.g., gcc, clang, MSVC on Windows)
- CMake 3.x or later
- Internet connection (for downloading Tesseract source code)
For Dynamic Linking
When using dynamic linking with system-installed libraries, you need:
- Rust 1.85.0 or later
- Tesseract 5.x and Leptonica libraries installed on your system (see Installation section)
- Internet connection (for downloading Tesseract source code)
No C++ compiler or CMake required for dynamic linking builds.
For a full development environment checklist (including optional tooling suggestions), see CONTRIBUTING.md.
Environment Variables
The following environment variables affect the build and test process:
Build Variables
CARGO_CLEAN: If set, cleans the cache directory before buildingRUSTC_WRAPPER: If set to "sccache", enables compiler caching with sccacheCC: Compiler selection for C code (affects Linux builds)HOME(Unix) orAPPDATA(Windows): Used to determine cache directory locationTESSERACT_RS_CACHE_DIR: Optional override for the cache root. When unset or not writable, the build falls back to the default OS-specific directory, and if that still fails, a temporary directory under the system temp folder is used automatically.
Test Variables
TESSDATA_PREFIX(Optional): Path to override the default tessdata directory. If not set, the crate will use its default cache directory.
Cache and Data Directories
The crate uses the following directory structure based on your operating system:
- macOS:
~/Library/Application Support/tesseract-rs - Linux:
~/.tesseract-rs - Windows:
%APPDATA%/tesseract-rs
The cache includes:
- Compiled Tesseract and Leptonica libraries
- Third-party source code
Training data is not downloaded during the build. Provide eng.traineddata (and any other languages you need) via TESSDATA_PREFIX or your system Tesseract installation.
Testing
The project includes several integration tests that verify OCR functionality. To run the tests:
-
Ensure you have the required test dependencies:
[] = "0.25.9" -
Run the tests:
Note: Make sure eng.traineddata is available in your tessdata directory before running tests. If TESSDATA_PREFIX is not set, the tests look in the default cache location. You can point the tests at a custom tessdata directory by setting:
# Linux/macOS
# Windows (PowerShell)
$env:TESSDATA_PREFIX="C:\path\to\custom\tessdata"
Available test cases:
- OCR on English sample images
- Error handling and invalid input coverage
Test images are sourced from the shared test_documents/ directory in the repository:
images/test_hello_world.png: Simple English texttables/simple_table.png: Basic table with English headers
Usage
Here's a basic example of how to use tesseract-rs:
use PathBuf;
use Error;
use TesseractAPI;
Advanced Usage
The API provides additional functionality for more complex OCR tasks, including thread-safe operations:
use TesseractAPI;
use Arc;
use thread;
use Error;
// Helper function to get tessdata directory
// Helper function to load test image
Building
Static Linking (Default)
With static linking, the crate will automatically download and compile Tesseract and Leptonica during the build process. This may take some time on the first build (5-10 minutes), but subsequent builds will use the cached libraries.
To clean the cache and force a rebuild:
CARGO_CLEAN=1
Dynamic Linking
With dynamic linking, the build is much faster (seconds instead of minutes) since it only links against system-installed libraries:
Note: Dynamic linking requires Tesseract and Leptonica to be installed on your system (see Installation section).
Documentation
For more detailed information, please check the API documentation.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgements
This project is based on the original tesseract-rs by Cafer Can Gündoğdu. We are grateful for the foundational work that made this project possible.
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Quick Start for Contributors
- Fork and clone the repository
- Install uv and set up git hooks:
| - Make your changes following our commit message format
- Run tests:
cargo test - Submit a Pull Request
Our commit messages follow the Conventional Commits specification.
Acknowledgements
This project uses Tesseract OCR and Leptonica. We are grateful to the maintainers and contributors of these projects.