whispr

A general-purpose Rust library for audio AI services — text-to-speech, speech-to-text, and audio-to-audio transformations.

Overview

Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though OpenAI is currently the primary supported provider.

Current Status

✅ OpenAI Audio API — Full support for TTS, STT, and audio-to-audio
🔮 Future — Provider abstraction to support multiple backends (ElevenLabs, Azure, Google Cloud, etc.)

Installation

[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }

Quick Start

use whispr::{Client, TtsModel, Voice};

#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
    let client = Client::from_env()?; // reads OPENAI_API_KEY

    // Text to Speech
    let audio = client
        .speech()
        .text("Hello, world!")
        .voice(Voice::Nova)
        .generate()
        .await?;

    std::fs::write("hello.mp3", &audio)?;
    Ok(())
}

Features

Text to Speech

Convert text to natural-sounding audio with multiple voices and customization options.

use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};

let client = Client::from_env()?;

let audio = client
    .speech()
    .text("Welcome to whispr!")
    .voice(Voice::Nova)
    .model(TtsModel::Gpt4oMiniTts)
    .format(AudioFormat::Mp3)
    .speed(1.0)
    .instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
    .generate()
    .await?;

std::fs::write("output.mp3", &audio)?;

Available Voices: Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer, Verse

Available Models:

Gpt4oMiniTts — Latest model with instruction support
Tts1 — Optimized for speed
Tts1Hd — Optimized for quality

Speech to Text

Transcribe audio files to text with optional language hints.

let result = client
    .transcription()
    .file("recording.mp3").await?
    .language("en")
    .transcribe()
    .await?;

println!("Transcription: {}", result.text);

From bytes (useful for recorded audio):

let wav_data: Vec<u8> = record_audio();

let result = client
    .transcription()
    .bytes(wav_data, "recording.wav")
    .transcribe()
    .await?;

Audio to Audio

Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.

let (transcription, audio) = client.audio_to_audio("input.mp3").await?;

println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;

Streaming

For real-time applications, stream audio as it's generated:

use futures::StreamExt;

let mut stream = client
    .speech()
    .text("This is a longer text that will be streamed...")
    .generate_stream()
    .await?;

while let Some(chunk) = stream.next().await {
    let bytes = chunk?;
    // Process audio chunk in real-time
}

Voice Prompts

The prompts module includes pre-built voice personalities for common use cases:

use whispr::prompts;

client.speech()
    .text("Let's get moving!")
    .model(TtsModel::Gpt4oMiniTts)
    .instructions(prompts::FITNESS_COACH)
    .generate()
    .await?;

Available prompts: FITNESS_COACH, MEDITATION_GUIDE, STORYTELLER, NEWS_ANCHOR, FRIENDLY_ASSISTANT, and more.

Configuration

Environment Variable

The simplest setup — set OPENAI_API_KEY in your environment:

let client = Client::from_env()?;

Direct API Key

let client = Client::new("sk-...");

Custom Configuration

use whispr::client::ClientConfig;

let config = ClientConfig::new("sk-...")
    .with_base_url("https://custom-endpoint.com/v1")
    .with_organization("org-...")
    .with_project("proj-...");

let client = Client::with_config(config);

Roadmap

Whispr is designed to be a general-purpose audio AI library. The current implementation focuses on OpenAI, but the architecture will evolve to support multiple providers:

whispr/
├── providers/
│   ├── openai/      # Current implementation
│   ├── elevenlabs/  # Planned
│   ├── azure/       # Planned
│   └── google/      # Planned
└── traits/          # Provider-agnostic interfaces

Planned features:

Provider trait abstraction
ElevenLabs support
Azure Cognitive Services support
Google Cloud Text-to-Speech support
Local model support (e.g., Coqui TTS)
Automatic provider fallback

License

MIT License — see LICENSE for details.

whispr 0.1.0