Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
tesseract-rs
tesseract-rs
is a Rust binding for Tesseract OCR with built-in compilation of Tesseract and Leptonica libraries. This project aims to provide a safe and idiomatic Rust interface to Tesseract's functionality while handling the complexity of compiling the underlying C++ libraries.
Features
- Safe Rust bindings for Tesseract OCR
- Built-in compilation of Tesseract and Leptonica
- Automatic download of Tesseract training data (English and Turkish)
- High-level Rust API for common OCR tasks
- Caching of compiled libraries for faster subsequent builds
- Support for multiple operating systems (Linux, macOS, Windows)
Installation
Add this to your Cargo.toml
:
[]
= { = "0.1.19", = ["build-tesseract"] }
For development and testing, you'll also need these dependencies:
[]
= "0.25.5"
= "0.25.0"
System Requirements
To build this crate, you need:
- A C++ compiler (e.g., gcc, clang)
- CMake
- Internet connection (for downloading Tesseract training data)
- Rust 1.83.0 or later
Environment Variables
The following environment variables affect the build and test process:
Build Variables
CARGO_CLEAN
: If set, cleans the cache directory before buildingRUSTC_WRAPPER
: If set to "sccache", enables compiler caching with sccacheCC
: Compiler selection for C code (affects Linux builds)HOME
(Unix) orAPPDATA
(Windows): Used to determine cache directory location
Test Variables
TESSDATA_PREFIX
(Optional): Path to override the default tessdata directory. If not set, the crate will use its default cache directory.
Cache and Data Directories
The crate uses the following directory structure based on your operating system:
- macOS:
~/Library/Application Support/tesseract-rs
- Linux:
~/.tesseract-rs
- Windows:
%APPDATA%/tesseract-rs
The cache includes:
- Compiled Tesseract and Leptonica libraries
- Downloaded training data (eng.traineddata, tur.traineddata) in the
tessdata
subdirectory - Third-party source code
The training data files are automatically downloaded and placed in the appropriate tessdata
subdirectory during the build process. You don't need to manually set up the tessdata directory unless you want to use a custom location.
Testing
The project includes several integration tests that verify OCR functionality. To run the tests:
-
Ensure you have the required test dependencies:
[] = "0.25.5" = "0.25.0"
-
Run the tests:
Note: Setting TESSDATA_PREFIX
is optional. If not set, the tests will use the default tessdata directory in the cache location. If you want to use a custom tessdata directory, you can set it:
# Linux/macOS
# Windows (PowerShell)
$env:TESSDATA_PREFIX="C:\path\to\custom\tessdata"
Available test cases:
test_multiple_languages_with_lstm
: Tests LSTM engine with multiple languagestest_ocr_on_real_image
: Tests OCR on a sample English text imagetest_multiple_languages
: Tests recognition of mixed English and Turkish texttest_digit_recognition
: Tests digit-only recognition with whitelisttest_error_handling
: Tests error cases and invalid inputs
Test images are located in the tests/test_images/
directory:
sample_text.png
: English text samplemultilang_sample.png
: Mixed English and Turkish text- Additional test images can be added to this directory
Usage
Here's a basic example of how to use tesseract-rs
:
use PathBuf;
use Error;
use TesseractAPI;
Advanced Usage
The API provides additional functionality for more complex OCR tasks, including thread-safe operations:
use TesseractAPI;
use Arc;
use thread;
use Error;
// Helper function to get tessdata directory
// Helper function to load test image
Building
The crate will automatically download and compile Tesseract and Leptonica during the build process. This may take some time on the first build, but subsequent builds will use the cached libraries.
To clean the cache and force a rebuild:
CARGO_CLEAN=1
Documentation
For more detailed information, please check the API documentation.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributors
Contribution
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgements
This project uses Tesseract OCR and Leptonica. We are grateful to the maintainers and contributors of these projects.