# text-processing-rs
Inverse Text Normalization (ITN) for Rust — convert spoken-form ASR output to written form.
## What it does
Converts spoken-form ASR output to written form:
| two hundred thirty two | 232 |
| five dollars and fifty cents | $5.50 |
| january fifth twenty twenty five | January 5, 2025 |
| quarter past two pm | 02:15 p.m. |
| one point five billion dollars | $1.5 billion |
| seventy two degrees fahrenheit | 72 °F |
## Usage
### Rust
```rust
use text_processing_rs::normalize;
let result = normalize("two hundred");
assert_eq!(result, "200");
let result = normalize("five dollars and fifty cents");
assert_eq!(result, "$5.50");
```
### Swift
```swift
import NemoTextProcessing
let result = NemoTextProcessing.normalize("two hundred")
// result is "200"
let money = NemoTextProcessing.normalize("five dollars and fifty cents")
// money is "$5.50"
```
## Compatibility
**98.6% compatible** with the original NeMo test suite (1200/1217 tests passing).
| Cardinal numbers | 100% |
| Ordinal numbers | 100% |
| Decimal numbers | 100% |
| Money | 100% |
| Measurements | 100% |
| Dates | 100% |
| Time | 97% |
| Electronic (email/URL) | 96% |
| Telephone/IP | 96% |
| Whitelist terms | 100% |
## Features
- Cardinal and ordinal number conversion
- Decimal numbers with scale words (million, billion)
- Currency formatting (USD, with scale words)
- Measurements including temperature (°C, °F, K) and data rates (gbps)
- Date parsing (multiple formats)
- Time parsing with AM/PM and timezone preservation
- Email and URL normalization
- Phone numbers, IP addresses, SSN
- Case preservation for proper nouns and abbreviations
## Building
### Rust
```bash
cargo build
cargo test
```
### Swift (XCFramework)
```bash
# Install Rust targets
rustup target add aarch64-apple-darwin x86_64-apple-darwin
rustup target add aarch64-apple-ios aarch64-apple-ios-sim
# Build XCFramework
./build-xcframework.sh
```
Output:
- `output/NemoTextProcessing.xcframework` - Add to Xcode project
- `output/NemoTextProcessing.swift` - Swift wrapper
## License
Apache 2.0
## Acknowledgments
This project is a Rust implementation based on the inverse text normalization grammars from [NVIDIA NeMo Text Processing](https://github.com/NVIDIA/NeMo-text-processing). All credit for the original algorithms and test cases goes to the NVIDIA NeMo team.