# whispr
A general-purpose voice <-> crate — text-to-speech, speech-to-text, and audio-to-audio transformations. Also supports realtime conversations.
[](https://crates.io/crates/whispr)
[](https://docs.rs/whispr)
[](https://opensource.org/licenses/MIT)
## Overview
Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though only openai is currently implemented.
## Installation
```toml
[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }
```
## Quick Start
```rust
use whispr::{Client, TtsModel, Voice};
#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
let client = Client::from_env()?; // reads OPENAI_API_KEY
// Text to Speech
let audio = client
.speech()
.text("Hello, world!")
.voice(Voice::Nova)
.generate()
.await?;
std::fs::write("hello.mp3", &audio)?;
Ok(())
}
```
## Features
### Text to Speech
Convert text to natural-sounding audio with multiple voices and customization options.
```rust
use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};
let client = Client::from_env()?;
let audio = client
.speech()
.text("Welcome to whispr!")
.voice(Voice::Nova)
.model(TtsModel::Gpt4oMiniTts)
.format(AudioFormat::Mp3)
.speed(1.0)
.instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
.generate()
.await?;
std::fs::write("output.mp3", &audio)?;
```
**Available Voices:** `Alloy`, `Ash`, `Ballad`, `Coral`, `Echo`, `Fable`, `Nova`, `Onyx`, `Sage`, `Shimmer`, `Verse`
**Available Models:**
- `Gpt4oMiniTts` — Latest model with instruction support
- `Tts1` — Optimized for speed
- `Tts1Hd` — Optimized for quality
### Speech to Text
Transcribe audio files to text with optional language hints.
```rust
let result = client
.transcription()
.file("recording.mp3").await?
.language("en")
.transcribe()
.await?;
println!("Transcription: {}", result.text);
```
**From bytes (useful for recorded audio):**
```rust
let wav_data: Vec<u8> = record_audio();
let result = client
.transcription()
.bytes(wav_data, "recording.wav")
.transcribe()
.await?;
```
### Audio to Audio
Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.
```rust
let (transcription, audio) = client.audio_to_audio("input.mp3").await?;
println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;
```
### Streaming
For real-time applications, stream audio as it's generated:
```rust
use futures::StreamExt;
let mut stream = client
.speech()
.text("This is a longer text that will be streamed...")
.generate_stream()
.await?;
while let Some(chunk) = stream.next().await {
let bytes = chunk?;
// Process audio chunk in real-time
}
```
## Prompts
The `prompts` module includes pre-built voice personalities for common use cases:
```rust
use whispr::prompts;
client.speech()
.text("Let's get moving!")
.model(TtsModel::Gpt4oMiniTts)
.instructions(prompts::FITNESS_COACH)
.generate()
.await?;
```
## License
MIT License — see [LICENSE](LICENSE) for details.