piper-phoneme-streaming 0.1.1

A high-performance Rust library for streaming Text-to-Phoneme (G2P) conversion.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
# Dynamic Language Detection — Implementation Plan

> Supersedes `docs/dynamic_language.md` with concrete, step-by-step implementation details.

## Design Decision: Parallel Language Queue

The original doc proposes adding `Language` fields to `ExpandUnit` variants. We choose a **parallel language queue** instead — `ExpandUnit` stays unchanged, `TextExpand` tracks language in a parallel `VecDeque<Language>` and returns `(ExpandUnit, Language)` tuples. This means **zero changes to any `ExpandTask` implementation** (all 9 tasks across EN + VI).

---

## PR 1: Internal Plumbing (no behavior change, no new dependency)

All existing tests must pass identically. Single-language behavior is unchanged.

### 1.1 `text_expand.rs` — Struct & Constructor

```rust
pub struct TextExpand {
    tasks_by_lang: HashMap<Language, Vec<Box<dyn ExpandTask>>>,

    // Language detection (None = single-language, skip detection)
    detector: Option<Box<dyn LanguageDetector>>,
    current_language: Language,
    context_window: VecDeque<String>,   // max 5 words

    // Parallel language tracking (same length as input_units / output_units)
    input_units: VecDeque<ExpandUnit>,
    input_langs: VecDeque<Language>,
    output_units: VecDeque<(ExpandUnit, Language)>,

    // Tokenizer state (unchanged)
    buffer: String,
    buffer_is_number: bool,
}
```

> **Why `Option<Box<dyn LanguageDetector>>`?** We define a small internal trait so PR 1 compiles without `lingua`. PR 3 plugs in the real detector.

Constructors:

```rust
/// Single-language (backward compat). No detection overhead.
pub fn with_language(language: Language) -> Self {
    let mut tasks_by_lang = HashMap::new();
    tasks_by_lang.insert(language, get_tasks_for_language(language));
    Self {
        tasks_by_lang,
        detector: None,
        current_language: language,
        context_window: VecDeque::new(),
        input_units: VecDeque::new(),
        input_langs: VecDeque::new(),
        output_units: VecDeque::new(),
        buffer: String::new(),
        buffer_is_number: false,
    }
}

/// Multi-language with detection.
pub fn with_languages(
    languages: &[Language],
    default_language: Language,
    detector: Box<dyn LanguageDetector>,
) -> Self { ... }

/// Test-only: multi-language without detection (uses default for everything).
pub fn new(tasks: Vec<Box<dyn ExpandTask>>) -> Self  // keep for tests
```

### 1.2 `push` / `finish` Signature Change

```rust
// Before:
pub fn push(&mut self, ch: char) -> Option<ExpandUnit>
pub fn finish(&mut self) -> Option<ExpandUnit>

// After:
pub fn push(&mut self, ch: char) -> Option<(ExpandUnit, Language)>
pub fn finish(&mut self) -> Option<(ExpandUnit, Language)>
```

### 1.3 `flush_buffer` — Language Assignment

```rust
fn flush_buffer(&mut self) {
    if self.buffer.is_empty() { return; }
    let content = std::mem::take(&mut self.buffer);

    let lang = if self.buffer_is_number {
        self.current_language               // numbers inherit
    } else {
        self.detect_language(&content)      // words get detected
    };

    let unit = if self.buffer_is_number {
        ExpandUnit::Number(content)
    } else {
        ExpandUnit::Word(content)
    };
    self.input_units.push_back(unit);
    self.input_langs.push_back(lang);
}
```

In `process_char`, marks also inherit:

```rust
self.input_units.push_back(ExpandUnit::Mark(ch));
self.input_langs.push_back(self.current_language);

// Sentence boundaries clear context window
if matches!(ch, '.' | '?' | '!') {
    self.context_window.clear();
}
```

**PR 1 stub:** `detect_language` just returns `self.current_language` (no actual detection yet).

### 1.4 `try_expand` — Task Routing + Language Inheritance

```rust
fn try_expand(&mut self, is_final: bool) {
    'outer: while !self.input_units.is_empty() {
        let front_lang = self.input_langs[0];
        let tasks = self.tasks_by_lang.get(&front_lang)
            .map(|v| v.as_slice())
            .unwrap_or(&[]);

        for task in tasks {
            match task.expand(&self.input_units) {
                Some(ExpandResult::Maybe) => {
                    if !is_final { break 'outer; }
                }
                Some(ExpandResult::Replace(n, new_units)) => {
                    debug_assert!(n > 0);
                    // Pop n units + langs
                    for _ in 0..n {
                        self.input_units.pop_front();
                        self.input_langs.pop_front();
                    }
                    // Prepend replacements with INHERITED language
                    for unit in new_units.into_iter().rev() {
                        self.input_units.push_front(unit);
                        self.input_langs.push_front(front_lang);
                    }
                    continue 'outer;
                }
                None => {}
            }
        }

        // No task matched — emit with language
        if let Some(unit) = self.input_units.pop_front() {
            let lang = self.input_langs.pop_front()
                .unwrap_or(self.current_language);
            self.output_units.push_back((unit, lang));
        }
    }
}
```

**Invariant:** `input_units.len() == input_langs.len()` must always hold.

### 1.5 `TextUnit::Word` Gets Language

```rust
// Before:
pub enum TextUnit {
    Word(String),
    Space,
    ClauseBoundary(char),
    Punctuation(char),
}

// After:
pub enum TextUnit {
    Word(String, Language),
    Space,
    ClauseBoundary(char),
    Punctuation(char),
}
```

Replace `From<ExpandUnit> for TextUnit` with:

```rust
impl TextUnit {
    pub fn from_expand_unit(unit: ExpandUnit, language: Language) -> Self {
        match unit {
            ExpandUnit::Word(s) | ExpandUnit::Number(s) => TextUnit::Word(s, language),
            ExpandUnit::Mark(c) if c.is_whitespace() => TextUnit::Space,
            ExpandUnit::Mark(c) if matches!(c, ',' | '.' | '!' | '?' | ';' | ':') => {
                TextUnit::ClauseBoundary(c)
            }
            ExpandUnit::Mark(c) => TextUnit::Punctuation(c),
        }
    }
}
```

### 1.6 `semantic.rs` — Update Pattern Match

```rust
impl SentenceUnit {
    pub fn from_text_unit(
        unit: TextUnit,
        phonemizer: &WordPhonemizer,
    ) -> crate::error::Result<Self> {
        match unit {
            TextUnit::Word(word, _lang) => {
                // Caller picks the right phonemizer; we just destructure
                Ok(SentenceUnit::Word(phonemizer.phonemize_word(&word)?))
            }
            TextUnit::Space => Ok(SentenceUnit::Space),
            TextUnit::ClauseBoundary(ch) => Ok(SentenceUnit::ClauseBoundary(ch)),
            TextUnit::Punctuation(ch) => Ok(SentenceUnit::Punctuation(ch)),
        }
    }
}
```

### 1.7 Callers Update

**`g2p/full.rs`:**
```rust
// TextUnit::from(unit) → TextUnit::from_expand_unit(unit, lang)
if let Some((unit, lang)) = expander.push(ch) {
    let text_unit = TextUnit::from_expand_unit(unit, lang);
    let su = SentenceUnit::from_text_unit(text_unit, &self.word_phonemizer)?;
    sentence_units.push(su);
}
```

**`g2p/streaming.rs`:** Same pattern.

**`tests/common/mod.rs`:**
```rust
pub fn collect_units(text: &str) -> Vec<TextUnit> {
    let mut expander = TextExpand::new(vec![]);
    let mut units = Vec::new();
    for ch in text.chars() {
        if let Some((unit, lang)) = expander.push(ch) {
            units.push(TextUnit::from_expand_unit(unit, lang));
        }
    }
    while let Some((unit, lang)) = expander.finish() {
        units.push(TextUnit::from_expand_unit(unit, lang));
    }
    units
}
```

Tests that match `TextUnit::Word(s)` become `TextUnit::Word(s, _)`.

### 1.8 Internal Tests Update

`text_expand.rs` unit tests (`run_test`, `test_text_expand_cases_en/vi`) need updating to handle `(ExpandUnit, Language)` return type. The expected values remain the same; just unwrap the tuple.

---

## PR 2: Multi-Language G2P Layer

### 2.1 `FullG2p`

```rust
pub struct FullG2p {
    phonemizers: HashMap<Language, WordPhonemizer>,
    sentence_upgrades: HashMap<Language, FullSentencePhonemeUpgrade>,
    default_language: Language,
    languages: Vec<Language>,
}

impl FullG2p {
    /// Single-language (backward compat)
    pub fn new(language: Language) -> Result<Self> {
        Self::with_languages(&[language], language)
    }

    /// Multi-language
    pub fn with_languages(languages: &[Language], default: Language) -> Result<Self> {
        let mut phonemizers = HashMap::new();
        let mut sentence_upgrades = HashMap::new();
        for &lang in languages {
            phonemizers.insert(lang, WordPhonemizer::new(lang)?);
            sentence_upgrades.insert(lang, FullSentencePhonemeUpgrade::new(lang)?);
        }
        Ok(Self { phonemizers, sentence_upgrades, default_language: default, languages: languages.to_vec() })
    }
}
```

The `g2p` method picks the right phonemizer per word:

```rust
if let Some((unit, lang)) = expander.push(ch) {
    let text_unit = TextUnit::from_expand_unit(unit, lang);
    let phonemizer = &self.phonemizers[&lang];
    let su = SentenceUnit::from_text_unit(text_unit, phonemizer)?;
    sentence_units.push(su);
}
```

Use `default_language` for sentence upgrade (prosody). This is safe because:
- Vietnamese stress is handled per-word via `WordPhoneme.language` in the renderer
- English stress promotion only applies to English words

### 2.2 `StreamingG2P`

Same pattern: `HashMap<Language, WordPhonemizer>`, per-word lookup.

### 2.3 Renderer Fix (`sentence_upgrade/mod.rs`)

The `Renderer` currently uses `self.language` for Vietnamese-specific logic. Fix to use per-word language:

```rust
// Line ~302 — change:
if self.language == Language::Vietnamese {
// To:
if word.language == Language::Vietnamese {

// Line ~316 — change:
self.language == Language::English,
// To:
word.language == Language::English,
```

Also: `phdata.select_table_by_name(word.language.as_str())` must be called per word in the renderer, same pattern as `tests/common/mod.rs:57`.

### 2.4 Tests

- All existing single-language tests pass (regression)
- New test: `FullG2p::with_languages(&[EN, VI], EN)` phonemizes English text correctly
- New test: `FullG2p::with_languages(&[EN, VI], VI)` phonemizes Vietnamese text correctly
- No mixed-text test yet (detection isn't wired)

---

## PR 3: `lingua` Integration

### 3.1 Dependency

```toml
[dependencies]
lingua = { version = "1.6", default-features = false, features = ["english", "vietnamese"] }
```

### 3.2 Internal LanguageDetector Trait

```rust
// In text_expand.rs (or a new file src/lang_detect.rs)
pub(crate) trait LanguageDetector: Send + Sync {
    fn detect(&self, context: &str) -> Option<(Language, f64)>;
}
```

### 3.3 `lingua` Implementation

```rust
pub(crate) struct LinguaDetector {
    detector: lingua::LanguageDetector,
}

impl LinguaDetector {
    pub fn new(languages: &[Language]) -> Self {
        let lingua_langs: Vec<lingua::Language> = languages.iter()
            .map(|l| match l {
                Language::English => lingua::Language::English,
                Language::Vietnamese => lingua::Language::Vietnamese,
            })
            .collect();
        let detector = lingua::LanguageDetectorBuilder::from_languages(&lingua_langs)
            .with_minimum_relative_distance(0.25)
            .build();
        Self { detector }
    }
}

impl LanguageDetector for LinguaDetector {
    fn detect(&self, context: &str) -> Option<(Language, f64)> {
        let confidences = self.detector.compute_language_confidence_values(context);
        confidences.first().map(|c| {
            let lang = match c.language() {
                lingua::Language::English => Language::English,
                lingua::Language::Vietnamese => Language::Vietnamese,
                _ => Language::English,
            };
            (lang, c.value())
        })
    }
}
```

### 3.4 Detection Algorithm in `TextExpand`

```rust
const CONTEXT_WINDOW_SIZE: usize = 5;
const HYSTERESIS_THRESHOLD: f64 = 0.20;

fn detect_language(&mut self, word: &str) -> Language {
    let detector = match &self.detector {
        Some(d) => d,
        None => return self.current_language,   // single-language fast path
    };

    // Update context window
    self.context_window.push_back(word.to_string());
    if self.context_window.len() > CONTEXT_WINDOW_SIZE {
        self.context_window.pop_front();
    }

    let context: String = self.context_window.iter()
        .map(|s| s.as_str())
        .collect::<Vec<_>>()
        .join(" ");

    if let Some((top_lang, top_confidence)) = detector.detect(&context) {
        if top_lang != self.current_language
            && top_confidence > 0.5 + HYSTERESIS_THRESHOLD
        {
            self.current_language = top_lang;
        }
    }

    self.current_language
}
```

### 3.5 Wire It Up

```rust
pub fn with_languages(languages: &[Language], default: Language) -> Self {
    let mut tasks_by_lang = HashMap::new();
    for &lang in languages {
        tasks_by_lang.insert(lang, get_tasks_for_language(lang));
    }
    let detector = LinguaDetector::new(languages);
    Self {
        tasks_by_lang,
        detector: Some(Box::new(detector)),
        current_language: default,
        // ...
    }
}
```

### 3.6 Tests

```rust
#[test]
fn detects_english_words() {
    let mut expander = TextExpand::with_languages(
        &[Language::English, Language::Vietnamese],
        Language::English,
    );
    // Push "hello world" → both words detected as English
}

#[test]
fn detects_vietnamese_words() {
    // Push "xin chào bạn" → all words detected as Vietnamese
}

#[test]
fn switches_language_mid_sentence() {
    // Push "Hello, tôi tên là John" → EN, VI, VI, VI, EN
}

#[test]
fn numbers_inherit_current_language() {
    // Push "giá 100 đồng" → 100 inherits Vietnamese
}

#[test]
fn sentence_boundary_resets_context() {
    // Push "Xin chào. Hello world" → VI then EN after boundary
}
```

### 3.7 E2E Fixtures

Add `tests/fixtures/mixed.jsonl` with mixed EN/VI sentences and expected phoneme output. Test with `FullG2p::with_languages`.

---

## PR 4: Polish

- **Feature gate:** `lingua` behind cargo feature `lang-detect` (optional)
- **Benchmarks:** Measure detection overhead per word
- **Threshold tuning:** Empirical testing with real mixed-text TTS inputs
- **Doc update:** Update `CLAUDE.md` architecture section, remove `docs/dynamic_language.md`

---

## Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| `lingua` binary size (~2-5 MB for 2 langs) | Larger binary | Feature-gate in PR 4 |
| Short ambiguous words ("a", "la") | Wrong language detection | Context window + hysteresis |
| `PhonemeData` table switching per word | Performance regression | Profile; tables are small lookups |
| `promote_clauses` with mixed phonemes | Corrupted stress bytes | Only promote English words (check `word.language`) |
| Breaking `push/finish` signature | All callers must update | PR 1 does this atomically |
| Parallel queue invariant violation | Panic/wrong language | `debug_assert!(input_units.len() == input_langs.len())` |

---

## File Change Summary

| File | PR 1 | PR 2 | PR 3 |
|------|------|------|------|
| `Cargo.toml` ||| add `lingua` |
| `src/text_expand.rs` | struct, push/finish, try_expand, flush_buffer, TextUnit || detect_language, with_languages |
| `src/semantic.rs` | TextUnit::Word pattern |||
| `src/g2p/full.rs` | caller update | HashMap phonemizers, with_languages ||
| `src/g2p/streaming.rs` | caller update | HashMap phonemizers, with_languages ||
| `src/sentence_upgrade/mod.rs` || Renderer per-word language ||
| `src/expand_tasks/**` ||||
| `tests/common/mod.rs` | caller update |||
| `tests/e2e.rs` ||| mixed fixtures |