Expand description
§Whispr
A general-purpose Rust library for audio AI services — text-to-speech, speech-to-text, and audio-to-audio transformations.
§Overview
Whispr provides a clean, ergonomic API for working with audio AI services. It’s designed to be provider-agnostic, though OpenAI is currently the primary supported provider.
§Current Status
- ✅ OpenAI Audio API — Full support for TTS, STT, and audio-to-audio
- ✅ Realtime API — WebSocket-based real-time audio conversations (feature:
realtime) - 🔮 Future — Provider abstraction to support multiple backends (ElevenLabs, Azure, Google Cloud, etc.)
§Quick Start
use whispr::{Client, Voice};
#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
let client = Client::from_env()?; // reads OPENAI_API_KEY
// Text to Speech
let audio = client
.speech()
.text("Hello, world!")
.voice(Voice::Nova)
.generate()
.await?;
std::fs::write("hello.mp3", &audio)?;
Ok(())
}§Features
§Text-to-Speech
Convert text to natural-sounding audio with multiple voices and customization options.
use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};
let client = Client::from_env()?;
let audio = client
.speech()
.text("Welcome to whispr!")
.voice(Voice::Nova)
.model(TtsModel::Gpt4oMiniTts)
.format(AudioFormat::Mp3)
.speed(1.0)
.instructions(prompts::FITNESS_COACH) // Voice personality
.generate()
.await?;§Speech-to-Text
Transcribe audio files to text with optional language hints.
let result = client
.transcription()
.file("recording.mp3").await?
.language("en")
.transcribe()
.await?;
println!("Transcription: {}", result.text);§Audio-to-Audio
Transcribe audio and generate new speech in one call.
let (transcription, audio) = client.audio_to_audio("input.mp3").await?;
println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;§Realtime API (requires realtime feature)
Low-latency, bidirectional audio streaming for voice agents.
ⓘ
use whispr::realtime::{RealtimeClient, RealtimeConfig, RealtimeVoice, ServerEvent};
let client = RealtimeClient::from_env()?;
let config = RealtimeConfig::default()
.with_voice(RealtimeVoice::Alloy)
.with_instructions("You are a helpful assistant.");
let mut session = client.connect(config).await?;
// Send audio and receive responses in real-time
while let Some(event) = session.next_event().await? {
match event {
ServerEvent::ResponseOutputAudioDelta { delta, .. } => {
// Play the audio delta
}
_ => {}
}
}§Roadmap
Whispr is designed to be a general-purpose audio AI library. The current implementation focuses on OpenAI, but the architecture will evolve to support multiple providers:
- Provider trait abstraction
- ElevenLabs support
- Azure Cognitive Services support
- Google Cloud Text-to-Speech support
- Local model support (e.g., Coqui TTS)
Modules§
- prompts
- Example voice instruction prompts for use with
gpt-4o-mini-tts.
Structs§
- Client
- A client for interacting with the OpenAI Audio API.
- Speech
Builder - A builder for text-to-speech requests.
- Transcription
Builder - A builder for speech-to-text (transcription) requests.
- Transcription
Response - Response from the transcription API.
Enums§
- Audio
Format - Available audio output formats for text-to-speech.
- Error
- The main error type for this crate.
- Transcription
Model - Available transcription (speech-to-text) models.
- TtsModel
- Available text-to-speech models.
- Voice
- Available voices for text-to-speech.
Type Aliases§
- Result
- Result type alias using the crate’s Error type