whispr
A general-purpose Rust library for audio AI services — text-to-speech, speech-to-text, and audio-to-audio transformations.
Overview
Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though OpenAI is currently the primary supported provider.
Current Status
- ✅ OpenAI Audio API — Full support for TTS, STT, and audio-to-audio
- 🔮 Future — Provider abstraction to support multiple backends (ElevenLabs, Azure, Google Cloud, etc.)
Installation
[]
= "0.1"
= { = "1", = ["full"] }
Quick Start
use ;
async
Features
Text to Speech
Convert text to natural-sounding audio with multiple voices and customization options.
use ;
let client = from_env?;
let audio = client
.speech
.text
.voice
.model
.format
.speed
.instructions // Voice personality (gpt-4o-mini-tts only)
.generate
.await?;
write?;
Available Voices: Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer, Verse
Available Models:
Gpt4oMiniTts— Latest model with instruction supportTts1— Optimized for speedTts1Hd— Optimized for quality
Speech to Text
Transcribe audio files to text with optional language hints.
let result = client
.transcription
.file.await?
.language
.transcribe
.await?;
println!;
From bytes (useful for recorded audio):
let wav_data: = record_audio;
let result = client
.transcription
.bytes
.transcribe
.await?;
Audio to Audio
Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.
let = client.audio_to_audio.await?;
println!;
write?;
Streaming
For real-time applications, stream audio as it's generated:
use StreamExt;
let mut stream = client
.speech
.text
.generate_stream
.await?;
while let Some = stream.next.await
Voice Prompts
The prompts module includes pre-built voice personalities for common use cases:
use prompts;
client.speech
.text
.model
.instructions
.generate
.await?;
Available prompts: FITNESS_COACH, MEDITATION_GUIDE, STORYTELLER, NEWS_ANCHOR, FRIENDLY_ASSISTANT, and more.
Configuration
Environment Variable
The simplest setup — set OPENAI_API_KEY in your environment:
let client = from_env?;
Direct API Key
let client = new;
Custom Configuration
use ClientConfig;
let config = new
.with_base_url
.with_organization
.with_project;
let client = with_config;
Roadmap
Whispr is designed to be a general-purpose audio AI library. The current implementation focuses on OpenAI, but the architecture will evolve to support multiple providers:
whispr/
├── providers/
│ ├── openai/ # Current implementation
│ ├── elevenlabs/ # Planned
│ ├── azure/ # Planned
│ └── google/ # Planned
└── traits/ # Provider-agnostic interfaces
Planned features:
- Provider trait abstraction
- ElevenLabs support
- Azure Cognitive Services support
- Google Cloud Text-to-Speech support
- Local model support (e.g., Coqui TTS)
- Automatic provider fallback
License
MIT License — see LICENSE for details.