Crate whispr

Crate whispr 

Source
Expand description

§Whispr

A general-purpose Rust library for audio AI services — text-to-speech, speech-to-text, and audio-to-audio transformations.

§Overview

Whispr provides a clean, ergonomic API for working with audio AI services. It’s designed to be provider-agnostic, though OpenAI is currently the primary supported provider.

§Current Status

  • OpenAI Audio API — Full support for TTS, STT, and audio-to-audio
  • Realtime API — WebSocket-based real-time audio conversations (feature: realtime)
  • 🔮 Future — Provider abstraction to support multiple backends (ElevenLabs, Azure, Google Cloud, etc.)

§Quick Start

use whispr::{Client, Voice};

#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
    let client = Client::from_env()?; // reads OPENAI_API_KEY

    // Text to Speech
    let audio = client
        .speech()
        .text("Hello, world!")
        .voice(Voice::Nova)
        .generate()
        .await?;

    std::fs::write("hello.mp3", &audio)?;
    Ok(())
}

§Features

§Text-to-Speech

Convert text to natural-sounding audio with multiple voices and customization options.

use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};

let client = Client::from_env()?;

let audio = client
    .speech()
    .text("Welcome to whispr!")
    .voice(Voice::Nova)
    .model(TtsModel::Gpt4oMiniTts)
    .format(AudioFormat::Mp3)
    .speed(1.0)
    .instructions(prompts::FITNESS_COACH) // Voice personality
    .generate()
    .await?;

§Speech-to-Text

Transcribe audio files to text with optional language hints.

let result = client
    .transcription()
    .file("recording.mp3").await?
    .language("en")
    .transcribe()
    .await?;

println!("Transcription: {}", result.text);

§Audio-to-Audio

Transcribe audio and generate new speech in one call.

let (transcription, audio) = client.audio_to_audio("input.mp3").await?;
println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;

§Realtime API (requires realtime feature)

Low-latency, bidirectional audio streaming for voice agents.

use whispr::realtime::{RealtimeClient, RealtimeConfig, RealtimeVoice, ServerEvent};

let client = RealtimeClient::from_env()?;

let config = RealtimeConfig::default()
    .with_voice(RealtimeVoice::Alloy)
    .with_instructions("You are a helpful assistant.");

let mut session = client.connect(config).await?;

// Send audio and receive responses in real-time
while let Some(event) = session.next_event().await? {
    match event {
        ServerEvent::ResponseOutputAudioDelta { delta, .. } => {
            // Play the audio delta
        }
        _ => {}
    }
}

§Roadmap

Whispr is designed to be a general-purpose audio AI library. The current implementation focuses on OpenAI, but the architecture will evolve to support multiple providers:

  • Provider trait abstraction
  • ElevenLabs support
  • Azure Cognitive Services support
  • Google Cloud Text-to-Speech support
  • Local model support (e.g., Coqui TTS)

Modules§

prompts
Example voice instruction prompts for use with gpt-4o-mini-tts.

Structs§

Client
A client for interacting with the OpenAI Audio API.
SpeechBuilder
A builder for text-to-speech requests.
TranscriptionBuilder
A builder for speech-to-text (transcription) requests.
TranscriptionResponse
Response from the transcription API.

Enums§

AudioFormat
Available audio output formats for text-to-speech.
Error
The main error type for this crate.
TranscriptionModel
Available transcription (speech-to-text) models.
TtsModel
Available text-to-speech models.
Voice
Available voices for text-to-speech.

Type Aliases§

Result
Result type alias using the crate’s Error type