text-processing-rs 0.1.0

Inverse Text Normalization (ITN) — convert spoken-form ASR output to written form
Documentation
  • Coverage
  • 100%
    37 out of 37 items documented3 out of 23 items with examples
  • Size
  • Source code size: 245.12 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 4.31 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 13s Average build duration of successful builds.
  • all releases: 13s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • FluidInference/text-processing-rs
    12 2 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • Alex-Wengg

text-processing-rs

Inverse Text Normalization (ITN) for Rust — convert spoken-form ASR output to written form.

What it does

Converts spoken-form ASR output to written form:

Input Output
two hundred thirty two 232
five dollars and fifty cents $5.50
january fifth twenty twenty five January 5, 2025
quarter past two pm 02:15 p.m.
one point five billion dollars $1.5 billion
seventy two degrees fahrenheit 72 °F

Usage

Rust

use text_processing_rs::normalize;

let result = normalize("two hundred");
assert_eq!(result, "200");

let result = normalize("five dollars and fifty cents");
assert_eq!(result, "$5.50");

Swift

import NemoTextProcessing

let result = NemoTextProcessing.normalize("two hundred")
// result is "200"

let money = NemoTextProcessing.normalize("five dollars and fifty cents")
// money is "$5.50"

Compatibility

98.6% compatible with the original NeMo test suite (1200/1217 tests passing).

Category Status
Cardinal numbers 100%
Ordinal numbers 100%
Decimal numbers 100%
Money 100%
Measurements 100%
Dates 100%
Time 97%
Electronic (email/URL) 96%
Telephone/IP 96%
Whitelist terms 100%

Features

  • Cardinal and ordinal number conversion
  • Decimal numbers with scale words (million, billion)
  • Currency formatting (USD, with scale words)
  • Measurements including temperature (°C, °F, K) and data rates (gbps)
  • Date parsing (multiple formats)
  • Time parsing with AM/PM and timezone preservation
  • Email and URL normalization
  • Phone numbers, IP addresses, SSN
  • Case preservation for proper nouns and abbreviations

Building

Rust

cargo build
cargo test

Swift (XCFramework)

# Install Rust targets
rustup target add aarch64-apple-darwin x86_64-apple-darwin
rustup target add aarch64-apple-ios aarch64-apple-ios-sim

# Build XCFramework
./build-xcframework.sh

Output:

  • output/NemoTextProcessing.xcframework - Add to Xcode project
  • output/NemoTextProcessing.swift - Swift wrapper

License

Apache 2.0

Acknowledgments

This project is a Rust implementation based on the inverse text normalization grammars from NVIDIA NeMo Text Processing. All credit for the original algorithms and test cases goes to the NVIDIA NeMo team.