piper-phoneme-streaming 0.1.1

# piper-phoneme-streaming

`piper-phoneme-streaming` is a high-performance Rust library for streaming Text-to-Phoneme (G2P) conversion. It is built to seamlessly integrate with modern streaming Text-to-Speech (TTS) engines like [Piper](https://github.com/rhasspy/piper) and others based on `espeak-ng`.

## What Problems Does It Solve?

Typical G2P (Grapheme-to-Phoneme) approaches wait for a full sentence or paragraph before converting text to phonemes. In real-time or streaming TTS applications, this introduces unacceptable latency.

`piper-phoneme-streaming` addresses this by:
- **Streaming natively:** Processing text character-by-character and yielding phonemes as soon as there is enough context (e.g., at word boundaries).
- **Dynamic Language Detection:** Seamlessly handling mixed-language input on the fly. It can automatically detect language boundaries (e.g., mixing English and Vietnamese) and switch phonemization strategies mid-sentence without interrupting the stream.
- **Accurate Text Normalization:** Built-in strategies to expand abbreviations, dates, numbers, and acronyms sequentially before phonemization.
- **`espeak-ng` Parity:** Employs direct execution of `espeak-ng`'s binary phoneme table and dictionary formats to assure generated phonemes match exactly what Piper or other models expect.

## How It Works

The library operates fundamentally in a push-based architecture via `StreamingG2P`:
1. **Text Expansion & Normalization:** Input characters are processed by `TextExpand`, which handles numbers, money, and typical abbreviations interactively.
2. **Language Detection:** If multiple languages are enabled, dynamic heuristics detect the language of incoming text batches on the fly.
3. **Word Phonemizer:** The `WordPhonemizer` matches the normalized text against the appropriate language's dictionary and runtime rules from embedded `espeak-ng` data.
4. **Sentence Upgrade:** `StreamingSentencePhonemeUpgrade` applies sentence-level syntax rules, stress assignments, and intonation corrections before finalizing the phoneme token stream.

## Usage Examples

### Streaming Conversion
The streaming API enables progressive consumption of text.

```rust
use piper_phoneme_streaming::{StreamingG2P, Language};

fn main() {
    // Initialize the engine with supported languages
    let g2p = StreamingG2P::with_languages(
        &[Language::English, Language::Vietnamese], 
        Language::English
    ).unwrap();

    // Create a new streaming session (maintains state across pushed chunks)
    let mut session = g2p.new_session();
    let text = "Hello world. Xin chào thế giới.";

    // Push characters individually or in chunks
    for ch in text.chars() {
        let output = g2p.push_text(&mut session, &ch.to_string()).unwrap();
        for phoneme in output {
            print!("{}", phoneme.token);
        }
    }

    // Flush any remaining buffered phonemes once the stream ends
    let tail = g2p.finish(&mut session).unwrap();
    for phoneme in tail {
        print!("{}", phoneme.token);
    }
}
```

### Normal Conversion
If streaming is not required, you can use the full conversion API to process the entire result at once.

```rust
use piper_phoneme_streaming::{FullG2p, Language};

fn main() {
    let g2p = FullG2p::new(Language::English).unwrap();
    let out = g2p.g2p("Hello world!").unwrap();
    
    let out_str: String = out.iter().map(|t| t.token).collect();
    println!("{}", out_str);
}
```

## Adding to Your Project

Add the dependency to your `Cargo.toml`:

```toml
[dependencies]
piper-phoneme-streaming = { path = "..." } # Or specify version if published
```