surya 0.1.0

Surya is a multilingual document OCR toolkit, original implementation in Python and PyTorch
Documentation

surya-rs

Build

Rust implementation of surya, a multilingual document OCR toolkit. The implementation is based on a modified version of Segformer.

How to build and install

Setup rust toolchain if you haven't yet:

# visit https://rustup.rs/ for more detailed information
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Build and install the binary:

# run this unless you have a mac with M1/2/3 chip
cargo install --path . --features=cli,metal --bin surya
# run this on a mac with M1/2/3 chip
cargo install --path . --features=cli --bin surya

The binary when built does not include the weights file itself, and will instead download via the HuggingFace Hub API. Once downloaded, the weights file will be cached in the HuggingFace cache directory.

Check -h for help:

❯ surya --help
Surya is a multilingual document OCR toolkit, original implementation in Python and PyTorch

Usage: surya [OPTIONS] --image <IMAGE>

Options:
      --image <IMAGE>                path to image
      --model-repo <MODEL_REPO>      model's hugging face repo [default: vikp/line_detector]
      --weights-name <WEIGHTS_NAME>  model's weights name [default: model.safetensors]
      --device-type <DEVICE_TYPE>    [default: cpu] [possible values: cpu, gpu, metal]
  -h, --help                         Print help
  -V, --version                      Print version

You can use this to control logging level:

export RUST_LOG=info # or debug, warn, etc.

Library

This lib is also published as a trait for other rust projects to use.