Module tts

Expand description

Text-to-Speech (TTS) module (GH-133).

Provides TTS primitives for:

Neural TTS synthesis
Mel spectrogram generation from text
Vocoder integration (HiFi-GAN, WaveGlow, etc.)
Multi-speaker synthesis

§Architecture

Text → Text Processing → Acoustic Model → Mel Spectrogram → Vocoder → Audio
             ↓                                    ↑
       Phoneme/Grapheme                   [Speaker Embedding]
       Encoding                           [Prosody Control]

§Example

use aprender::speech::tts::{TtsConfig, SpeechSynthesizer, SynthesisRequest};

let config = TtsConfig::default();
assert_eq!(config.sample_rate, 22050);
assert_eq!(config.n_mels, 80);

§Supported Models

Tacotron2-style (attention-based)
FastSpeech2-style (non-autoregressive)
VITS-style (end-to-end variational)

§References

Wang, Y., et al. (2017). Tacotron: End-to-End Speech Synthesis.
Ren, Y., et al. (2020). FastSpeech 2: Fast and High-Quality TTS.
Kim, J., et al. (2021). Conditional Variational Autoencoder with Adversarial Learning.

§PMAT Compliance

Zero unwrap() calls
All public APIs return Result<T, E> where fallible

Structs§

AlignmentInfo: Alignment information between text and audio.
FastSpeech2Synthesizer: FastSpeech2-style TTS synthesizer.
HifiGanVocoder: HiFi-GAN vocoder.
SynthesisRequest: A synthesis request with text and optional controls.
SynthesisResult: Synthesis result containing audio and metadata.
TtsConfig: TTS configuration.
VitsSynthesizer: VITS-style end-to-end TTS.

Traits§

SpeechSynthesizer: Trait for speech synthesis.
Vocoder: Trait for neural vocoder (mel to audio).

Functions§

estimate_duration: Estimate synthesis duration from text.
normalize_text: Normalize text for TTS (lowercase, expand abbreviations, etc.).
split_sentences: Split text into sentences for chunked synthesis.

Module tts

Module tts Copy item path

§Architecture

§Example

§Supported Models

§References

§PMAT Compliance

Structs§

Traits§

Functions§

Module tts