rust-tts-wrapper
Cross-platform TTS (Text-to-Speech) wrapper with C ABI. Mirrors js-tts-wrapper and swift-tts-wrapper.
Engines (21 total)
| Engine | Type | Credentials | Streaming | Voice List | Word Boundaries | Speech Markdown |
|---|---|---|---|---|---|---|
| System (speech-dispatcher) | Local | None | — | — | Estimated | — |
| Sherpa-ONNX | Local (191 models) | None | Simulated* | Speakers | Estimated | — |
| Azure | Cloud | Key + Region | Chunked | API | Estimated | Platform-aware |
| Google Cloud | Cloud | API Key | Chunked | API | Real (v1beta1 timepoints) | Platform-aware |
| OpenAI | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| ElevenLabs | Cloud | API Key | Chunked | API | Estimated | Platform-aware |
| Cartesia | Cloud | API Key | Chunked | API | Estimated | Platform-aware |
| Deepgram | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| PlayHT | Cloud | API Key + User ID | Chunked | — | Estimated | Platform-aware |
| Fish Audio | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| Hume AI | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| Mistral | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| Murf | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| Resemble AI | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| Unreal Speech | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| UpliftAI | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| Amazon Polly | Cloud | Key + Secret + Region | Chunked | — | Estimated | Platform-aware |
| IBM Watson | Cloud | Key + Region + Instance | Chunked | — | Estimated | Platform-aware |
| Wit.ai | Cloud | Token | Chunked | — | Estimated | Platform-aware |
| xAI | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
| ModelsLab | Cloud | API Key | Chunked | — | Estimated | Platform-aware |
- Streaming: Cloud engines stream audio in 8KB chunks via the
on_audiocallback. Sherpa-ONNX delivers all audio at once after synthesis (*simulated). - Voice List: Engines with "API" can enumerate voices from the provider's API.
- Word Boundaries: Google returns real timing via v1beta1 timepoints with SSML marks. All others use word-length-adjusted estimation (150 WPM baseline, configurable).
- Speech Markdown: Auto-detected and converted to platform-specific SSML via speechmarkdown-rust. Azure gets Microsoft SSML, Google gets Assistant SSML, others get Alexa SSML.
Rust API
TtsEngine Trait
Callback Types
pub type OnAudioCallback<'a> = &'a mut dyn FnMut;
pub type OnBoundaryCallback<'a> = &'a mut dyn FnMut; // word, start_s, end_s
pub type OnStartCallback<'a> = &'a mut dyn FnMut;
pub type OnEndCallback<'a> = &'a mut dyn FnMut;
pub type OnErrorCallback<'a> = &'a mut dyn FnMut;
Core Types
Utility Functions
// Word boundary estimation (matches Swift WordTimingEstimator)
;
;
// Speech Markdown preprocessing
;
// Gender normalization
;
Factory
;
;
;
C API
All functions are extern "C", #[no_mangle]:
| Function | Description |
|---|---|
tts_create(engine_id, credentials_json) |
Create engine, returns opaque tts_ctx* |
tts_destroy(ctx) |
Free engine context |
tts_speak(ctx, text) |
Speak async, returns 0/-1 |
tts_speak_sync(ctx, text) |
Speak sync (blocking) |
tts_stop(ctx) |
Stop speech |
tts_pause(ctx) |
Pause in-progress speech |
tts_resume(ctx) |
Resume paused speech |
tts_synth_to_bytes(ctx, text, out_bytes, out_len) |
Synth to buffer (returns 0/-1) |
tts_free_bytes(bytes, len) |
Free buffer from tts_synth_to_bytes |
tts_get_voices(ctx, out_voices, out_count) |
Get voice list |
tts_free_voices(voices, count) |
Free voice array |
tts_set_voice(ctx, voice_id) |
Set voice |
tts_set_rate(ctx, rate) |
Set rate (1.0 = normal) |
tts_set_pitch(ctx, pitch) |
Set pitch (1.0 = normal) |
tts_set_volume(ctx, volume) |
Set volume (1.0 = normal) |
tts_set_on_audio(ctx, cb, userdata) |
Set streaming audio callback |
tts_set_on_boundary(ctx, cb, userdata) |
Set word boundary callback |
tts_get_engine_count() |
Count registered engines |
tts_get_engines(out_engines) |
Get engine descriptors |
tts_free_engine_info(engines, count) |
Free engine info |
tts_get_last_error() |
Get last error message |
C Example
void
void
int
Rust Example
use ;
let engine = create_engine.unwrap;
// Simple speak
engine.speak.unwrap;
// With callbacks
let mut audio_cb = ;
let mut boundary_cb = ;
engine.speak_sync.unwrap;
// With SpeakOptions
let opts = SpeakOptions ;
engine.speak_with_options.unwrap;
// Synth to bytes
let audio = engine.synth_to_bytes.unwrap;
// Get voices
for v in engine.get_voices.unwrap
// Check credentials
assert!;
Build
Features
system— speech-dispatcher (Linux system TTS)cloud— all 19 cloud engines via HTTP + speechmarkdown-rust + base64sherpaonnx— Sherpa-ONNX offline TTS (191 models)
Lint & Test
Bindings
Python (bindings/python/tts_wrapper.py)
=
.NET (bindings/dotnet/TtsClient.cs)
using TtsWrapper;
var client = new TtsClient("openai", new() { {"apiKey", "your-key"} });
client.SetVoice("alloy");
client.SetRate(1.0f);
client.SetPitch(1.0f);
client.SetVolume(1.0f);
client.SpeakSync("Hello world");
client.Stop();
Swift (bindings/swift/TtsClient.swift)
let client = TTSClient(engineId: "openai", credentials: ["apiKey": "your-key"])
client.setVoice("alloy")
client.setRate(1.0)
client.speakSync("Hello world")
client.stop()
Architecture
TtsEngine (trait)
|
+--------------+--------------+
| | |
SystemEngine CloudEngine SherpaOnnxEngine
(speech- (19 cloud (191 local
dispatcher) providers) models)
Cloud engines use provider-specific CloudConfig:
- Azure: SSML XML body with prosody tags, XML escaping
- Google: JSON body with base64 audio, v1beta1 timepoint support
- All others: Standard JSON bodies
Sherpa-ONNX Models
191 models from the bundled merged_models.json registry. Models are loaded from ~/.rust-tts-wrapper/sherpaonnx/.
License
MIT