rwer
English | 简体中文
A modern Rust crate for Word Error Rate (WER), Character Error Rate (CER), and related ASR evaluation metrics.
Features
- WER (Word Error Rate):
(S + D + I) / N - CER (Character Error Rate): Same formula at Unicode grapheme cluster level
- MER (Match Error Rate):
(S + D + I) / (H + S + D + I) - WIP (Word Information Preserved):
(H/N) * (H/(H+S+D+I)) - WIL (Word Information Lost):
1 - WIP - Transform pipeline for text preprocessing (lowercase, remove punctuation, etc.)
- Chinese word segmentation via jieba-rs for word-level WER (optional feature)
- Alignment visualization with error frequency analysis
Quick Start
use ;
let reference = "the cat sat on the mat";
let hypothesis = "the cat sat on a mat";
println!;
println!;
All Metrics at Once
use ;
let output = process_words;
println!;
println!;
Output:
WER: 16.67%
MER: 16.67%
WIP: 0.7778
WIL: 0.2222
Hits: 4 Sub: 1 Del: 0 Ins: 0
REF: the cat sat
HYP: the dog sat
Transform Pipeline
use ;
let pipeline: = Boxnew;
let ref_text = pipeline.transform;
let hyp_text = pipeline.transform;
assert!;
Available Transforms
| Transform | Description |
|---|---|
ToLower |
Convert to lowercase |
ToUpper |
Convert to uppercase |
Strip |
Strip leading/trailing whitespace |
RemovePunctuation |
Remove Unicode punctuation |
RemoveMultipleSpaces |
Collapse consecutive spaces |
RemoveWhitespace |
Remove all whitespace |
SubstituteWords |
Replace whole words via a map |
RemoveSpecificWords |
Remove specified words |
ExpandCommonEnglishContractions |
Expand contractions (e.g., "don't" -> "do not") |
ToSimplified |
Convert Traditional Chinese to Simplified Chinese (chinese-variant feature) |
ToTraditional |
Convert Simplified Chinese to Traditional Chinese (chinese-variant feature) |
Chinese Word-Level WER
Note: Character-level metrics (CER) work with Chinese text out of the box — no feature flag needed.
Chinese word segmentation via jieba-rs is enabled by default. If you want to disable it:
[]
= { = "0.1", = false }
use chinese_wer;
let result = chinese_wer;
println!;
You can also use the tokenizer directly:
use ChineseTokenizer;
let tokenizer = new;
let words = tokenizer.cut;
println!;
Chinese Variant Normalization
When comparing ASR outputs that may use different Chinese scripts (Traditional vs Simplified), enable the chinese-variant feature:
[]
= { = "0.1", = ["chinese-variant"] }
use ;
// Normalize both texts to Simplified before comparison
let pipeline = new;
let ref_text = pipeline.transform;
let hyp_text = pipeline.transform;
assert_eq!;
CLI usage:
CLI
Enable the cli feature:
[]
= { = "0.1", = ["cli"] }
# Install
# Basic WER
# CER mode
# Show alignment
# All metrics
# With normalization
Error Analysis
use ;
let output = process_words;
let errors = collect_error_counts;
println!;
println!;
println!;
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
chinese-word |
Chinese word segmentation for word-level WER (default) | jieba-rs |
chinese-variant |
Traditional/Simplified Chinese conversion | zhconv |
cli |
CLI binary | clap, serde, serde_json |
Benchmarks
Acknowledgments
- jiwer — API design and architecture reference for WER/CER metrics
- jieba-rs — Chinese word segmentation
- zhconv — Traditional/Simplified Chinese conversion
License
Licensed under MIT.