# whispr
A general-purpose Rust library for audio AI services — text-to-speech, speech-to-text, and audio-to-audio transformations.
[](https://crates.io/crates/whispr)
[](https://docs.rs/whispr)
[](https://opensource.org/licenses/MIT)
## Overview
Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though **OpenAI is currently the primary supported provider**.
### Current Status
- ✅ **OpenAI Audio API** — Full support for TTS, STT, and audio-to-audio
- 🔮 **Future** — Provider abstraction to support multiple backends (ElevenLabs, Azure, Google Cloud, etc.)
## Installation
```toml
[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }
```
## Quick Start
```rust
use whispr::{Client, TtsModel, Voice};
#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
let client = Client::from_env()?; // reads OPENAI_API_KEY
// Text to Speech
let audio = client
.speech()
.text("Hello, world!")
.voice(Voice::Nova)
.generate()
.await?;
std::fs::write("hello.mp3", &audio)?;
Ok(())
}
```
## Features
### Text to Speech
Convert text to natural-sounding audio with multiple voices and customization options.
```rust
use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};
let client = Client::from_env()?;
let audio = client
.speech()
.text("Welcome to whispr!")
.voice(Voice::Nova)
.model(TtsModel::Gpt4oMiniTts)
.format(AudioFormat::Mp3)
.speed(1.0)
.instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
.generate()
.await?;
std::fs::write("output.mp3", &audio)?;
```
**Available Voices:** `Alloy`, `Ash`, `Ballad`, `Coral`, `Echo`, `Fable`, `Nova`, `Onyx`, `Sage`, `Shimmer`, `Verse`
**Available Models:**
- `Gpt4oMiniTts` — Latest model with instruction support
- `Tts1` — Optimized for speed
- `Tts1Hd` — Optimized for quality
### Speech to Text
Transcribe audio files to text with optional language hints.
```rust
let result = client
.transcription()
.file("recording.mp3").await?
.language("en")
.transcribe()
.await?;
println!("Transcription: {}", result.text);
```
**From bytes (useful for recorded audio):**
```rust
let wav_data: Vec<u8> = record_audio();
let result = client
.transcription()
.bytes(wav_data, "recording.wav")
.transcribe()
.await?;
```
### Audio to Audio
Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.
```rust
let (transcription, audio) = client.audio_to_audio("input.mp3").await?;
println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;
```
### Streaming
For real-time applications, stream audio as it's generated:
```rust
use futures::StreamExt;
let mut stream = client
.speech()
.text("This is a longer text that will be streamed...")
.generate_stream()
.await?;
while let Some(chunk) = stream.next().await {
let bytes = chunk?;
// Process audio chunk in real-time
}
```
## Voice Prompts
The `prompts` module includes pre-built voice personalities for common use cases:
```rust
use whispr::prompts;
client.speech()
.text("Let's get moving!")
.model(TtsModel::Gpt4oMiniTts)
.instructions(prompts::FITNESS_COACH)
.generate()
.await?;
```
Available prompts: `FITNESS_COACH`, `MEDITATION_GUIDE`, `STORYTELLER`, `NEWS_ANCHOR`, `FRIENDLY_ASSISTANT`, and more.
## Configuration
### Environment Variable
The simplest setup — set `OPENAI_API_KEY` in your environment:
```rust
let client = Client::from_env()?;
```
### Direct API Key
```rust
let client = Client::new("sk-...");
```
### Custom Configuration
```rust
use whispr::client::ClientConfig;
let config = ClientConfig::new("sk-...")
.with_base_url("https://custom-endpoint.com/v1")
.with_organization("org-...")
.with_project("proj-...");
let client = Client::with_config(config);
```
## Roadmap
Whispr is designed to be a general-purpose audio AI library. The current implementation focuses on OpenAI, but the architecture will evolve to support multiple providers:
```
whispr/
├── providers/
│ ├── openai/ # Current implementation
│ ├── elevenlabs/ # Planned
│ ├── azure/ # Planned
│ └── google/ # Planned
└── traits/ # Provider-agnostic interfaces
```
**Planned features:**
- [ ] Provider trait abstraction
- [ ] ElevenLabs support
- [ ] Azure Cognitive Services support
- [ ] Google Cloud Text-to-Speech support
- [ ] Local model support (e.g., Coqui TTS)
- [ ] Automatic provider fallback
## License
MIT License — see [LICENSE](LICENSE) for details.