rwer 0.1.2

A modern Rust crate for WER, CER, and related ASR evaluation metrics
Documentation

rwer

English | 简体中文

A modern Rust crate for Word Error Rate (WER), Character Error Rate (CER), and related ASR evaluation metrics.

Features

  • WER (Word Error Rate): (S + D + I) / N
  • CER (Character Error Rate): Same formula at Unicode grapheme cluster level
  • MER (Match Error Rate): (S + D + I) / (H + S + D + I)
  • WIP (Word Information Preserved): (H/N) * (H/(H+S+D+I))
  • WIL (Word Information Lost): 1 - WIP
  • Transform pipeline for text preprocessing (lowercase, remove punctuation, etc.)
  • Chinese word segmentation via jieba-rs for word-level WER (optional feature)
  • Alignment visualization with error frequency analysis

Quick Start

use rwer::{cer, wer};

let reference = "the cat sat on the mat";
let hypothesis = "the cat sat on a mat";

println!("WER: {:.2}%", wer(reference, hypothesis) * 100.0);
println!("CER: {:.2}%", cer(reference, hypothesis) * 100.0);

All Metrics at Once

use rwer::{process_words, visualize_alignment};

let output = process_words("the cat sat", "the dog sat");
println!("{output}");
println!("{}", visualize_alignment(&output));

Output:

WER:  16.67%
MER:  16.67%
WIP:  0.7778
WIL:  0.2222
Hits: 4  Sub: 1  Del: 0  Ins: 0
REF: the cat sat
HYP: the dog sat

Transform Pipeline

use rwer::{wer, Compose, ToLower, RemovePunctuation, Transform};

let pipeline: Box<dyn Transform> = Box::new(Compose::new(vec![
    Box::new(ToLower),
    Box::new(RemovePunctuation),
]));

let ref_text = pipeline.transform("Hello, World!");
let hyp_text = pipeline.transform("hello world");
assert!(wer(&ref_text, &hyp_text) < 1e-10);

Available Transforms

Transform Description
ToLower Convert to lowercase
ToUpper Convert to uppercase
Strip Strip leading/trailing whitespace
RemovePunctuation Remove Unicode punctuation
RemoveMultipleSpaces Collapse consecutive spaces
RemoveWhitespace Remove all whitespace
SubstituteWords Replace whole words via a map
RemoveSpecificWords Remove specified words
ExpandCommonEnglishContractions Expand contractions (e.g., "don't" -> "do not")
ToSimplified Convert Traditional Chinese to Simplified Chinese (chinese-variant feature)
ToTraditional Convert Simplified Chinese to Traditional Chinese (chinese-variant feature)
ChineseWordSegment Segment Chinese text into words via jieba (chinese-word feature)

Chinese Word-Level WER

Note: Character-level metrics (CER) work with Chinese text out of the box — no feature flag needed.

Chinese word segmentation via jieba-rs is enabled by default. If you want to disable it:

[dependencies]
rwer = { version = "0.1", default-features = false }

The recommended approach is to use ChineseWordSegment as a transform in the pipeline:

use rwer::{ChineseWordSegment, Compose, Transform, process_words, visualize_alignment};

let pipeline = Compose::new(vec![Box::new(ChineseWordSegment::new())]);

let ref_text = pipeline.transform("今天天气真好");
let hyp_text = pipeline.transform("今天天气很棒");

let output = process_words(&ref_text, &hyp_text);
println!("{output}");
println!("{}", visualize_alignment(&output));

You can combine Chinese segmentation with other transforms:

use rwer::{ChineseWordSegment, ToSimplified, Compose, Transform, process_words};

let pipeline = Compose::new(vec![
    Box::new(ToSimplified),
    Box::new(ChineseWordSegment::new()),
]);

let ref_text = pipeline.transform("今天天氣真好");
let hyp_text = pipeline.transform("今天天气很棒");
let output = process_words(&ref_text, &hyp_text);
println!("WER: {:.2}%", output.wer * 100.0);

Chinese Variant Normalization

When comparing ASR outputs that may use different Chinese scripts (Traditional vs Simplified), enable the chinese-variant feature:

[dependencies]
rwer = { version = "0.1", features = ["chinese-variant"] }
use rwer::{ToSimplified, Compose, Transform, wer};

// Normalize both texts to Simplified before comparison
let pipeline = Compose::new(vec![Box::new(ToSimplified)]);
let ref_text = pipeline.transform("繁體中文");
let hyp_text = pipeline.transform("简体中文");
assert_eq!(wer(&ref_text, &hyp_text), 0.0);

CLI usage:

rwer -s "繁體中文測試" "简体中文测试"

CLI

Enable the cli feature:

[dependencies]
rwer = { version = "0.1", features = ["cli"] }
# Install
cargo install rwer --all-features

# Basic WER
rwer "the cat sat on the mat" "the cat sat on a mat"

# CER mode
rwer --character "hello" "helo"

# Show alignment
rwer --alignment "the cat sat" "the dog sat"

# All metrics
rwer --all "the cat sat" "the dog sat"

# With normalization
rwer --lowercase --remove-punctuation "Hello, World!" "hello world"

Error Analysis

use rwer::{collect_error_counts, process_words};

let output = process_words("the cat sat on the mat", "a cat stood on a mat");
let errors = collect_error_counts(&output);

println!("Substitutions: {:?}", errors.substitutions);
println!("Insertions: {:?}", errors.insertions);
println!("Deletions: {:?}", errors.deletions);

Feature Flags

Feature Description Dependencies
chinese-word Chinese word segmentation for word-level WER (default) jieba-rs
chinese-variant Traditional/Simplified Chinese conversion zhconv
cli CLI binary clap, serde, serde_json

Benchmarks

cargo bench

Acknowledgments

  • jiwer — API design and architecture reference for WER/CER metrics
  • jieba-rs — Chinese word segmentation
  • zhconv — Traditional/Simplified Chinese conversion

License

Licensed under MIT.