Skip to main content

Module tts

Module tts 

Source
Expand description

Text-to-Speech (TTS) module (GH-133).

Provides TTS primitives for:

  • Neural TTS synthesis
  • Mel spectrogram generation from text
  • Vocoder integration (HiFi-GAN, WaveGlow, etc.)
  • Multi-speaker synthesis

§Architecture

Text → Text Processing → Acoustic Model → Mel Spectrogram → Vocoder → Audio
             ↓                                    ↑
       Phoneme/Grapheme                   [Speaker Embedding]
       Encoding                           [Prosody Control]

§Example

use aprender::speech::tts::{TtsConfig, SpeechSynthesizer, SynthesisRequest};

let config = TtsConfig::default();
assert_eq!(config.sample_rate, 22050);
assert_eq!(config.n_mels, 80);

§Supported Models

  • Tacotron2-style (attention-based)
  • FastSpeech2-style (non-autoregressive)
  • VITS-style (end-to-end variational)

§References

  • Wang, Y., et al. (2017). Tacotron: End-to-End Speech Synthesis.
  • Ren, Y., et al. (2020). FastSpeech 2: Fast and High-Quality TTS.
  • Kim, J., et al. (2021). Conditional Variational Autoencoder with Adversarial Learning.

§PMAT Compliance

  • Zero unwrap() calls
  • All public APIs return Result<T, E> where fallible

Structs§

AlignmentInfo
Alignment information between text and audio.
FastSpeech2Synthesizer
FastSpeech2-style TTS synthesizer.
HifiGanVocoder
HiFi-GAN vocoder.
SynthesisRequest
A synthesis request with text and optional controls.
SynthesisResult
Synthesis result containing audio and metadata.
TtsConfig
TTS configuration.
VitsSynthesizer
VITS-style end-to-end TTS.

Traits§

SpeechSynthesizer
Trait for speech synthesis.
Vocoder
Trait for neural vocoder (mel to audio).

Functions§

estimate_duration
Estimate synthesis duration from text.
normalize_text
Normalize text for TTS (lowercase, expand abbreviations, etc.).
split_sentences
Split text into sentences for chunked synthesis.