rwer

A modern Rust crate for Word Error Rate (WER), Character Error Rate (CER), and related ASR evaluation metrics.

Features

WER (Word Error Rate): (S + D + I) / N
CER (Character Error Rate): Same formula at Unicode grapheme cluster level
MER (Match Error Rate): (S + D + I) / (H + S + D + I)
WIP (Word Information Preserved): (H/N) * (H/(H+S+D+I))
WIL (Word Information Lost): 1 - WIP
Transform pipeline for text preprocessing (lowercase, remove punctuation, etc.)
Chinese word segmentation via jieba-rs for word-level WER (optional feature)
Alignment visualization with error frequency analysis

Quick Start

use rwer::{cer, wer};

let reference = "the cat sat on the mat";
let hypothesis = "the cat sat on a mat";

println!("WER: {:.2}%", wer(reference, hypothesis) * 100.0);
println!("CER: {:.2}%", cer(reference, hypothesis) * 100.0);

All Metrics at Once

use rwer::{process_words, visualize_alignment};

let output = process_words("the cat sat", "the dog sat");
println!("{output}");
println!("{}", visualize_alignment(&output));

Output:

WER:  16.67%
MER:  16.67%
WIP:  0.7778
WIL:  0.2222
Hits: 4  Sub: 1  Del: 0  Ins: 0
REF: the cat sat
HYP: the dog sat

Transform Pipeline

use rwer::{wer, Compose, ToLower, RemovePunctuation, Transform};

let pipeline: Box<dyn Transform> = Box::new(Compose::new(vec![
    Box::new(ToLower),
    Box::new(RemovePunctuation),
]));

let ref_text = pipeline.transform("Hello, World!");
let hyp_text = pipeline.transform("hello world");
assert!(wer(&ref_text, &hyp_text) < 1e-10);

Available Transforms

Transform	Description
`ToLower`	Convert to lowercase
`ToUpper`	Convert to uppercase
`Strip`	Strip leading/trailing whitespace
`RemovePunctuation`	Remove Unicode punctuation
`RemoveMultipleSpaces`	Collapse consecutive spaces
`RemoveWhitespace`	Remove all whitespace
`SubstituteWords`	Replace whole words via a map
`RemoveSpecificWords`	Remove specified words
`ExpandCommonEnglishContractions`	Expand contractions (e.g., "don't" -> "do not")
`ToSimplified`	Convert Traditional Chinese to Simplified Chinese (`chinese-variant` feature)
`ToTraditional`	Convert Simplified Chinese to Traditional Chinese (`chinese-variant` feature)

Chinese Word-Level WER

Note: Character-level metrics (CER) work with Chinese text out of the box — no feature flag needed.

Chinese word segmentation via jieba-rs is enabled by default. If you want to disable it:

[dependencies]
rwer = { version = "0.1", default-features = false }

use rwer::chinese_wer;

let result = chinese_wer("今天天气真好", "今天天气很棒");
println!("Chinese WER: {:.2}%", result * 100.0);

You can also use the tokenizer directly:

use rwer::ChineseTokenizer;

let tokenizer = ChineseTokenizer::new();
let words = tokenizer.cut("我们中出了一个叛徒");
println!("{:?}", words);

Chinese Variant Normalization

When comparing ASR outputs that may use different Chinese scripts (Traditional vs Simplified), enable the chinese-variant feature:

[dependencies]
rwer = { version = "0.1", features = ["chinese-variant"] }

use rwer::{ToSimplified, Compose, Transform, wer};

// Normalize both texts to Simplified before comparison
let pipeline = Compose::new(vec![Box::new(ToSimplified)]);
let ref_text = pipeline.transform("繁體中文");
let hyp_text = pipeline.transform("简体中文");
assert_eq!(wer(&ref_text, &hyp_text), 0.0);

CLI usage:

rwer -s "繁體中文測試" "简体中文测试"

CLI

Enable the cli feature:

[dependencies]
rwer = { version = "0.1", features = ["cli"] }

# Install
cargo install rwer --all-features

# Basic WER
rwer "the cat sat on the mat" "the cat sat on a mat"

# CER mode
rwer --character "hello" "helo"

# Show alignment
rwer --alignment "the cat sat" "the dog sat"

# All metrics
rwer --all "the cat sat" "the dog sat"

# With normalization
rwer --lowercase --remove-punctuation "Hello, World!" "hello world"

Error Analysis

use rwer::{collect_error_counts, process_words};

let output = process_words("the cat sat on the mat", "a cat stood on a mat");
let errors = collect_error_counts(&output);

println!("Substitutions: {:?}", errors.substitutions);
println!("Insertions: {:?}", errors.insertions);
println!("Deletions: {:?}", errors.deletions);

Feature Flags

Feature	Description	Dependencies
`chinese-word`	Chinese word segmentation for word-level WER (default)	`jieba-rs`
`chinese-variant`	Traditional/Simplified Chinese conversion	`zhconv`
`cli`	CLI binary	`clap`, `serde`, `serde_json`

Benchmarks

cargo bench

Acknowledgments

jiwer — API design and architecture reference for WER/CER metrics
jieba-rs — Chinese word segmentation
zhconv — Traditional/Simplified Chinese conversion

License

Licensed under MIT.

rwer 0.1.1