piper-phoneme-streaming
piper-phoneme-streaming is a high-performance Rust library for streaming Text-to-Phoneme (G2P) conversion. It is built to seamlessly integrate with modern streaming Text-to-Speech (TTS) engines like Piper and others based on espeak-ng.
What Problems Does It Solve?
Typical G2P (Grapheme-to-Phoneme) approaches wait for a full sentence or paragraph before converting text to phonemes. In real-time or streaming TTS applications, this introduces unacceptable latency.
piper-phoneme-streaming addresses this by:
- Streaming natively: Processing text character-by-character and yielding phonemes as soon as there is enough context (e.g., at word boundaries).
- Dynamic Language Detection: Seamlessly handling mixed-language input on the fly. It can automatically detect language boundaries (e.g., mixing English and Vietnamese) and switch phonemization strategies mid-sentence without interrupting the stream.
- Accurate Text Normalization: Built-in strategies to expand abbreviations, dates, numbers, and acronyms sequentially before phonemization.
espeak-ngParity: Employs direct execution ofespeak-ng's binary phoneme table and dictionary formats to assure generated phonemes match exactly what Piper or other models expect.
How It Works
The library operates fundamentally in a push-based architecture via StreamingG2P:
- Text Expansion & Normalization: Input characters are processed by
TextExpand, which handles numbers, money, and typical abbreviations interactively. - Language Detection: If multiple languages are enabled, dynamic heuristics detect the language of incoming text batches on the fly.
- Word Phonemizer: The
WordPhonemizermatches the normalized text against the appropriate language's dictionary and runtime rules from embeddedespeak-ngdata. - Sentence Upgrade:
StreamingSentencePhonemeUpgradeapplies sentence-level syntax rules, stress assignments, and intonation corrections before finalizing the phoneme token stream.
Usage Examples
Streaming Conversion
The streaming API enables progressive consumption of text.
use ;
Normal Conversion
If streaming is not required, you can use the full conversion API to process the entire result at once.
use ;
Adding to Your Project
Add the dependency to your Cargo.toml:
[]
= { = "..." } # Or specify version if published