# piper-phoneme-streaming
`piper-phoneme-streaming` is a high-performance Rust library for streaming Text-to-Phoneme (G2P) conversion. It is built to seamlessly integrate with modern streaming Text-to-Speech (TTS) engines like [Piper](https://github.com/rhasspy/piper) and others based on `espeak-ng`.
## What Problems Does It Solve?
Typical G2P (Grapheme-to-Phoneme) approaches wait for a full sentence or paragraph before converting text to phonemes. In real-time or streaming TTS applications, this introduces unacceptable latency.
`piper-phoneme-streaming` addresses this by:
- **Streaming natively:** Processing text character-by-character and yielding phonemes as soon as there is enough context (e.g., at word boundaries).
- **Dynamic Language Detection:** Seamlessly handling mixed-language input on the fly. It can automatically detect language boundaries (e.g., mixing English and Vietnamese) and switch phonemization strategies mid-sentence without interrupting the stream.
- **Accurate Text Normalization:** Built-in strategies to expand abbreviations, dates, numbers, and acronyms sequentially before phonemization.
- **`espeak-ng` Parity:** Employs direct execution of `espeak-ng`'s binary phoneme table and dictionary formats to assure generated phonemes match exactly what Piper or other models expect.
## How It Works
The library operates fundamentally in a push-based architecture via `StreamingG2P`:
1. **Text Expansion & Normalization:** Input characters are processed by `TextExpand`, which handles numbers, money, and typical abbreviations interactively.
2. **Language Detection:** If multiple languages are enabled, dynamic heuristics detect the language of incoming text batches on the fly.
3. **Word Phonemizer:** The `WordPhonemizer` matches the normalized text against the appropriate language's dictionary and runtime rules from embedded `espeak-ng` data.
4. **Sentence Upgrade:** `StreamingSentencePhonemeUpgrade` applies sentence-level syntax rules, stress assignments, and intonation corrections before finalizing the phoneme token stream.
## Usage Examples
### Streaming Conversion
The streaming API enables progressive consumption of text.
```rust
use piper_phoneme_streaming::{StreamingG2P, Language};
fn main() {
// Initialize the engine with supported languages
let g2p = StreamingG2P::with_languages(
&[Language::English, Language::Vietnamese],
Language::English
).unwrap();
// Create a new streaming session (maintains state across pushed chunks)
let mut session = g2p.new_session();
let text = "Hello world. Xin chào thế giới.";
// Push characters individually or in chunks
for ch in text.chars() {
let output = g2p.push_text(&mut session, &ch.to_string()).unwrap();
for phoneme in output {
print!("{}", phoneme.token);
}
}
// Flush any remaining buffered phonemes once the stream ends
let tail = g2p.finish(&mut session).unwrap();
for phoneme in tail {
print!("{}", phoneme.token);
}
}
```
### Normal Conversion
If streaming is not required, you can use the full conversion API to process the entire result at once.
```rust
use piper_phoneme_streaming::{FullG2p, Language};
fn main() {
let g2p = FullG2p::new(Language::English).unwrap();
let out = g2p.g2p("Hello world!").unwrap();
let out_str: String = out.iter().map(|t| t.token).collect();
println!("{}", out_str);
}
```
## Adding to Your Project
Add the dependency to your `Cargo.toml`:
```toml
[dependencies]
piper-phoneme-streaming = { path = "..." } # Or specify version if published
```