Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
uroman-rs
A blazingly fast, self-contained Rust reimplementation of the uroman universal romanizer.
Overview
uroman-rs is a complete rewrite of the original uroman (Universal Romanizer) in Rust. It provides high-speed, accurate romanization for a vast number of languages and writing systems, faithfully reproducing the behavior of the original implementation.
This project is licensed under the Apache License 2.0. As a reimplementation, it respects and includes the license of the original uroman software. For full details, please refer to the License section below.
β¨ Features
- π Blazing Fast Performance: Approximately 22 times faster than the standard Python version, making it ideal for large-scale data processing. (See Benchmark)
- π¦ Self-Contained: A Pure Rust implementation with no dependency on external runtimes like Python or Perl. It compiles to a single, portable binary.
- π― High Fidelity: A true drop-in replacement for the original
uroman, passing its comprehensive test suite. - π§° Rich Output Formats: Supports multiple output formats, including simple strings (
str), and structured JSON data with character offsets (edges), alternatives (alts), and all lattice paths (lattice). - π§ Versatile: Use it as a standalone Command-Line Interface (CLI) tool or as a library in your own Rust applications.
π A Note on Romanization Logic and Limitations
uroman-rs is a high-fidelity reimplementation of the original uroman and passes its comprehensive test suite. This means its romanization logic, including its strengths and limitations, is identical to the original implementation created by NLP researchers.
The original authors provide excellent documentation on the specific behaviors of the romanizer. To use uroman-rs effectively, we recommend reviewing these details, especially concerning:
- Reversibility: Details on whether the romanization process can be reliably reversed.
- Known Limitations: Important information about cases where
uromanmay not perform as expected.
π¦ Installation
The uroman-rs project is available as a crate named uroman. You can use it both as a command-line tool and as a library in your Rust projects.
As a Command-Line Tool
To install the uroman-rs command-line tool, run the following:
This will install the executable as uroman-rs on your system.
As a Library
To use uroman as a library, add it to your project's dependencies.
βοΈ Usage
Command-Line Interface (CLI)
uroman-rs can be used directly from your terminal.
Show sample conversions: See examples of how various scripts are romanized.
View all options:
Display the help message for a full list of commands and flags.
Use in REPL:
Run uroman-rs without any arguments to process input line by line. Press Ctrl+D to exit.
>> γγγ«γ‘γ―γδΈηοΌ
>> αΊα¨ααα
>> ()
As a Library
Here is a basic example to get you started.
// `Uroman::new()` is an infallible operation.
// It doesn't return a `Result`, so no error handling is needed.
let uroman = new;
let romanized_string/*: String*/ = uroman..to_output_string;
assert_eq!;
println!;
For more advanced use cases, including file processing and generating detailed JSON output, please see the code in the examples/ directory.
π Benchmark
uroman-rs offers a dramatic performance improvement over the standard Python implementation. To provide a fair and robust comparison, we used the hyperfine benchmarking tool to measure the total execution time for a common task.
Test Environment
- CPU: [Intel(R) Core(TM) i7-14700]
- OS: [WSL2 Ubuntu 24.04]
- Tool:
hyperfinev1.18.0 - Test File:
multi-script.txtfrom the originaluromanrepository.
Results
| Implementation | Mean Time (Β± Ο) | Performance |
|---|---|---|
uroman-rs (This project) |
99.3 ms Β± 3.6 ms | ~22x Faster |
uroman.py (via uv run) |
2180 ms Β± 26 ms | Baseline |
License
This project is licensed under the Apache License, Version 2.0.
Acknowledgements
uroman-rs is a Rust implementation of the original uroman software by Ulf Hermjakob. As such, it is a derivative work and includes the original license notice in the NOTICE file.
Please be aware that any academic publication of projects using uroman-rs should acknowledge the use of the original uroman software as specified in its license. For details, please see the NOTICE file.