whispr 0.2.0 - Docs.rs

# whispr

A general-purpose voice <-> crate — text-to-speech, speech-to-text, and audio-to-audio transformations. Also supports realtime conversations.

[![Crates.io](https://img.shields.io/crates/v/whispr.svg)](https://crates.io/crates/whispr)
[![Documentation](https://docs.rs/whispr/badge.svg)](https://docs.rs/whispr)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though only openai is currently implemented.

## Installation

```toml
[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }
```

## Quick Start

```rust
use whispr::{Client, TtsModel, Voice};

#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
    let client = Client::from_env()?; // reads OPENAI_API_KEY

    // Text to Speech
    let audio = client
        .speech()
        .text("Hello, world!")
        .voice(Voice::Nova)
        .generate()
        .await?;

    std::fs::write("hello.mp3", &audio)?;
    Ok(())
}
```

## Features

### Text to Speech

Convert text to natural-sounding audio with multiple voices and customization options.

```rust
use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};

let client = Client::from_env()?;

let audio = client
    .speech()
    .text("Welcome to whispr!")
    .voice(Voice::Nova)
    .model(TtsModel::Gpt4oMiniTts)
    .format(AudioFormat::Mp3)
    .speed(1.0)
    .instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
    .generate()
    .await?;

std::fs::write("output.mp3", &audio)?;
```

**Available Voices:** `Alloy`, `Ash`, `Ballad`, `Coral`, `Echo`, `Fable`, `Nova`, `Onyx`, `Sage`, `Shimmer`, `Verse`

**Available Models:**
- `Gpt4oMiniTts` — Latest model with instruction support
- `Tts1` — Optimized for speed
- `Tts1Hd` — Optimized for quality

### Speech to Text

Transcribe audio files to text with optional language hints.

```rust
let result = client
    .transcription()
    .file("recording.mp3").await?
    .language("en")
    .transcribe()
    .await?;

println!("Transcription: {}", result.text);
```

**From bytes (useful for recorded audio):**

```rust
let wav_data: Vec<u8> = record_audio();

let result = client
    .transcription()
    .bytes(wav_data, "recording.wav")
    .transcribe()
    .await?;
```

### Audio to Audio

Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.

```rust
let (transcription, audio) = client.audio_to_audio("input.mp3").await?;

println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;
```

### Streaming

For real-time applications, stream audio as it's generated:

```rust
use futures::StreamExt;

let mut stream = client
    .speech()
    .text("This is a longer text that will be streamed...")
    .generate_stream()
    .await?;

while let Some(chunk) = stream.next().await {
    let bytes = chunk?;
    // Process audio chunk in real-time
}
```

## Prompts

The `prompts` module includes pre-built voice personalities for common use cases:

```rust
use whispr::prompts;

client.speech()
    .text("Let's get moving!")
    .model(TtsModel::Gpt4oMiniTts)
    .instructions(prompts::FITNESS_COACH)
    .generate()
    .await?;
```

## License

MIT License — see [LICENSE](LICENSE) for details.