whispr 0.1.0 - Docs.rs

# whispr

A general-purpose Rust library for audio AI services — text-to-speech, speech-to-text, and audio-to-audio transformations.

[![Crates.io](https://img.shields.io/crates/v/whispr.svg)](https://crates.io/crates/whispr)
[![Documentation](https://docs.rs/whispr/badge.svg)](https://docs.rs/whispr)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

Whispr provides a clean, ergonomic API for working with audio AI services. It's designed to be provider-agnostic, though **OpenAI is currently the primary supported provider**.

### Current Status

- ✅ **OpenAI Audio API** — Full support for TTS, STT, and audio-to-audio
- 🔮 **Future** — Provider abstraction to support multiple backends (ElevenLabs, Azure, Google Cloud, etc.)

## Installation

```toml
[dependencies]
whispr = "0.1"
tokio = { version = "1", features = ["full"] }
```

## Quick Start

```rust
use whispr::{Client, TtsModel, Voice};

#[tokio::main]
async fn main() -> Result<(), whispr::Error> {
    let client = Client::from_env()?; // reads OPENAI_API_KEY

    // Text to Speech
    let audio = client
        .speech()
        .text("Hello, world!")
        .voice(Voice::Nova)
        .generate()
        .await?;

    std::fs::write("hello.mp3", &audio)?;
    Ok(())
}
```

## Features

### Text to Speech

Convert text to natural-sounding audio with multiple voices and customization options.

```rust
use whispr::{Client, TtsModel, Voice, AudioFormat, prompts};

let client = Client::from_env()?;

let audio = client
    .speech()
    .text("Welcome to whispr!")
    .voice(Voice::Nova)
    .model(TtsModel::Gpt4oMiniTts)
    .format(AudioFormat::Mp3)
    .speed(1.0)
    .instructions(prompts::FITNESS_COACH) // Voice personality (gpt-4o-mini-tts only)
    .generate()
    .await?;

std::fs::write("output.mp3", &audio)?;
```

**Available Voices:** `Alloy`, `Ash`, `Ballad`, `Coral`, `Echo`, `Fable`, `Nova`, `Onyx`, `Sage`, `Shimmer`, `Verse`

**Available Models:**
- `Gpt4oMiniTts` — Latest model with instruction support
- `Tts1` — Optimized for speed
- `Tts1Hd` — Optimized for quality

### Speech to Text

Transcribe audio files to text with optional language hints.

```rust
let result = client
    .transcription()
    .file("recording.mp3").await?
    .language("en")
    .transcribe()
    .await?;

println!("Transcription: {}", result.text);
```

**From bytes (useful for recorded audio):**

```rust
let wav_data: Vec<u8> = record_audio();

let result = client
    .transcription()
    .bytes(wav_data, "recording.wav")
    .transcribe()
    .await?;
```

### Audio to Audio

Transcribe audio and generate new speech in one call — useful for voice transformation, translation, or processing pipelines.

```rust
let (transcription, audio) = client.audio_to_audio("input.mp3").await?;

println!("Said: {}", transcription.text);
std::fs::write("output.mp3", &audio)?;
```

### Streaming

For real-time applications, stream audio as it's generated:

```rust
use futures::StreamExt;

let mut stream = client
    .speech()
    .text("This is a longer text that will be streamed...")
    .generate_stream()
    .await?;

while let Some(chunk) = stream.next().await {
    let bytes = chunk?;
    // Process audio chunk in real-time
}
```

## Voice Prompts

The `prompts` module includes pre-built voice personalities for common use cases:

```rust
use whispr::prompts;

client.speech()
    .text("Let's get moving!")
    .model(TtsModel::Gpt4oMiniTts)
    .instructions(prompts::FITNESS_COACH)
    .generate()
    .await?;
```

Available prompts: `FITNESS_COACH`, `MEDITATION_GUIDE`, `STORYTELLER`, `NEWS_ANCHOR`, `FRIENDLY_ASSISTANT`, and more.

## Configuration

### Environment Variable

The simplest setup — set `OPENAI_API_KEY` in your environment:

```rust
let client = Client::from_env()?;
```

### Direct API Key

```rust
let client = Client::new("sk-...");
```

### Custom Configuration

```rust
use whispr::client::ClientConfig;

let config = ClientConfig::new("sk-...")
    .with_base_url("https://custom-endpoint.com/v1")
    .with_organization("org-...")
    .with_project("proj-...");

let client = Client::with_config(config);
```

## Roadmap

Whispr is designed to be a general-purpose audio AI library. The current implementation focuses on OpenAI, but the architecture will evolve to support multiple providers:

```
whispr/
├── providers/
│   ├── openai/      # Current implementation
│   ├── elevenlabs/  # Planned
│   ├── azure/       # Planned
│   └── google/      # Planned
└── traits/          # Provider-agnostic interfaces
```

**Planned features:**
- [ ] Provider trait abstraction
- [ ] ElevenLabs support
- [ ] Azure Cognitive Services support
- [ ] Google Cloud Text-to-Speech support
- [ ] Local model support (e.g., Coqui TTS)
- [ ] Automatic provider fallback

## License

MIT License — see [LICENSE](LICENSE) for details.