svara
svara (Sanskrit: स्वर — voice / tone / musical note) — Formant and vocal synthesis for Rust.
Complete formant-based vocal synthesis pipeline: dual glottal source models (Rosenberg B + LF), SOA-vectorized formant filtering, 48 phonemes, prosodic control, look-ahead coarticulation, and spectral analysis. Built on hisab for math and naad for DSP primitives.
Features
- Dual glottal models: Rosenberg B polynomial + Liljencrants-Fant (LF) with Rd voice quality parameter
- SOA formant filter: Structure-of-arrays biquad bank (MAX_FORMANTS=8) with compiler auto-vectorization — 2x faster than scalar
- 48 phonemes: 15 vowels, 5 diphthongs, 6 plosives, 9 fricatives, 3 nasals, 4 approximants, 2 affricates, glottal stop, tap/flap, silence
- Hillenbrand formant data: Per-vowel frequencies and bandwidths from Hillenbrand et al. (1995)
- Vocal tract: Parallel formant bank + nasal coupling (place-dependent) + subglottal resonance + lip radiation + source-filter interaction + DC blocking + gain normalization
- Prosody: Monotone cubic f0 contours, 4 intonation patterns, stress, Catmull-Rom interpolation
- Coarticulation: Look-ahead onset, sigmoid crossfades, per-phoneme resistance coefficients (Recasens DAC), F2 locus equations
- Voice profiles: Male/female/child presets with f0-dependent bandwidth scaling, vibrato, builder pattern
- Spectral analysis: FFT-based spectrum, formant estimation, band energy, compensated RMS
- Performance: ~1,000x real-time, f64 biquad coefficients,
no_stdcompatible, all typesSend + Sync
Quick Start
use *;
let voice = new_male;
let samples = synthesize_phoneme.unwrap;
let mut seq = new;
seq.push;
seq.push;
seq.push;
let audio = seq.render.unwrap;
Feature Flags
| Flag | Default | Description |
|---|---|---|
std |
Yes | Standard library. Disable for no_std + alloc |
naad-backend |
Yes | Use naad for oscillators and filters |
logging |
No | Structured logging via tracing-subscriber |
Architecture
GlottalSource (Rosenberg/LF) → VocalTract → Output
│
┌─────────────┼──────────────┐
│ │ │
FormantFilter Nasal Lip Radiation
(SOA biquad Coupling + Subglottal
bank, 8-wide) (place-dep) + Interaction
Consumers
- dhvani — AGNOS audio engine
- vansh — Voice AI shell (TTS/STT)
- prani — Creature vocal synthesis (depends on svara)
License
GPL-3.0-only