livespeech-sdk 0.1.13

Real-time speech-to-speech AI conversation SDK
Documentation
# LiveSpeech SDK for Rust

[![Crates.io](https://img.shields.io/crates/v/livespeech-sdk.svg)](https://crates.io/crates/livespeech-sdk)
[![Documentation](https://docs.rs/livespeech-sdk/badge.svg)](https://docs.rs/livespeech-sdk)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A Rust SDK for real-time speech-to-speech AI conversations.

## Features

- đŸŽ™ī¸ **Real-time Voice Conversations** - Natural, low-latency voice interactions
- 🌐 **Multi-language Support** - Korean, English, Japanese, Chinese, and more
- 🔊 **Streaming Audio** - Send and receive audio in real-time
- âšī¸ **Barge-in Support** - Interrupt AI mid-speech by talking or programmatically
- 🔄 **Auto-reconnection** - Automatic recovery from network issues

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }
```

## Quick Start (5 minutes)

```rust
use livespeech_sdk::{Config, LiveSpeechClient, LiveSpeechEvent, Region, SessionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Create client
    let config = Config::builder()
        .region(Region::ApNortheast2)
        .api_key("your-api-key")
        .build()?;
    let client = LiveSpeechClient::new(config);

    // 2. Handle events (only 4 essential events!)
    let mut events = client.subscribe();
    tokio::spawn(async move {
        while let Ok(event) = events.recv().await {
            match event {
                // Play AI audio
                LiveSpeechEvent::Audio(e) => {
                    audio_player.queue(&e.data);  // PCM16 @ 24kHz
                }
                // User interrupted - CLEAR BUFFER!
                LiveSpeechEvent::Interrupted(_) => {
                    audio_player.clear();
                }
                // AI finished speaking
                LiveSpeechEvent::TurnComplete(_) => {
                    println!("AI finished");
                }
                // Handle errors
                LiveSpeechEvent::Error(e) => {
                    eprintln!("Error: {}", e.message);
                }
                _ => {}
            }
        }
    });

    // 3. Connect and start
    client.connect().await?;
    client.start_session(Some(SessionConfig::new("You are a helpful assistant."))).await?;

    // 4. Send audio
    client.audio_start().await?;
    for chunk in audio_chunks {
        client.send_audio_chunk(&chunk).await?;  // PCM16 @ 16kHz
    }
    client.audio_end().await?;

    // 5. Cleanup
    client.end_session().await?;
    client.disconnect().await;
    Ok(())
}
```

---

# Core API

Everything you need for basic voice conversations.

## Methods

| Method | Description |
|--------|-------------|
| `connect()` | Establish connection |
| `disconnect()` | Close connection |
| `start_session(config)` | Start conversation with system prompt |
| `end_session()` | End conversation |
| `send_audio_chunk(data)` | Send PCM16 audio (16kHz) |

## Events

| Event | Description | Action Required |
|-------|-------------|-----------------|
| `Audio` | AI's audio output | Play audio (PCM16 @ 24kHz) |
| `TurnComplete` | AI finished speaking | Ready for next input |
| `Interrupted` | User barged in | **Clear audio buffer!** |
| `Error` | Error occurred | Handle/log error |

### âš ī¸ Critical: Handle `Interrupted`

When the user speaks while AI is responding, **you must clear your audio buffer**:

```rust
LiveSpeechEvent::Interrupted(_) => {
    audio_player.clear();  // Stop buffered audio immediately
    audio_player.stop();
}
```

Without this, 2-3 seconds of buffered audio continues playing after the user interrupts.

## Audio Format

| Direction | Format | Sample Rate |
|-----------|--------|-------------|
| Input (mic) | PCM16 | 16,000 Hz |
| Output (AI) | PCM16 | 24,000 Hz |

## Configuration

```rust
let config = Config::builder()
    .region(Region::ApNortheast2)    // Required
    .api_key("your-api-key")          // Required
    .build()?;

let session = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR");          // Optional: ko-KR, en-US, ja-JP, etc.
```

---

# Advanced API

Optional features for power users.

## Additional Methods

| Method | Description |
|--------|-------------|
| `audio_start()` / `audio_end()` | Manual audio stream control |
| `interrupt()` | Explicitly stop AI response (for Stop button) |
| `send_system_message(text)` | Inject context during conversation |
| `send_tool_response(id, result)` | Reply to function calls |
| `update_user_id(user_id)` | Migrate guest to authenticated user |

## Additional Events

| Event | Description |
|-------|-------------|
| `Connected` / `Disconnected` | Connection lifecycle |
| `SessionStarted` / `SessionEnded` | Session lifecycle |
| `Ready` | Session ready for audio |
| `UserTranscript` | User's speech transcribed |
| `Response` | AI's response text |
| `ToolCall` | AI wants to call a function |
| `UserIdUpdated` | Guest-to-user migration complete |

---

## Explicit Interrupt (Stop Button)

For UI "Stop" buttons or programmatic control:

```rust
// User clicks Stop button
client.interrupt().await?;
```

Note: Voice barge-in works automatically via Gemini's VAD. This method is for explicit control.

---

## System Messages

Inject text context during live sessions (game events, app state, etc.):

```rust
// AI responds immediately
client.send_system_message("User completed level 5. Congratulate them!").await?;

// Context only, no response
client.send_system_message_with_options("User is browsing", false).await?;
```

> Requires active live session (`audio_start()` called). Max 500 characters.

---

## Function Calling (Tool Use)

Let AI call functions in your app:

### 1. Define Tools

```rust
let tools = vec![Tool {
    name: "get_price".to_string(),
    description: "Gets product price by ID".to_string(),
    parameters: Some(FunctionParameters {
        r#type: "OBJECT".to_string(),
        properties: serde_json::json!({
            "productId": { "type": "string" }
        }),
        required: vec!["productId".to_string()],
    }),
}];

let session = SessionConfig::new("You are helpful.")
    .with_tools(tools);
```

### 2. Handle ToolCall Events

```rust
LiveSpeechEvent::ToolCall(e) => {
    let result = match e.name.as_str() {
        "get_price" => {
            let price = lookup_price(&e.args["productId"]);
            serde_json::json!({ "price": price })
        }
        _ => serde_json::json!({ "error": "Unknown" })
    };
    client.send_tool_response(&e.id, result).await.ok();
}
```

---

## Conversation Memory

Enable persistent memory across sessions:

```rust
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    .user_id("user-123")  // Enables memory
    .build()?;
```

| Mode | Memory |
|------|--------|
| With `user_id` | Permanent (entities, summaries) |
| Without `user_id` | Session only (guest) |

### Guest-to-User Migration

```rust
// User logs in during session
client.update_user_id("authenticated-user-123").await?;

// Listen for confirmation
LiveSpeechEvent::UserIdUpdated(e) => {
    println!("Migrated {} messages", e.migrated_messages);
}
```

---

## AI Speaks First

AI initiates the conversation:

```rust
let session = SessionConfig::new("Greet the customer warmly.")
    .with_ai_speaks_first(true);

client.start_session(Some(session)).await?;
client.audio_start().await?;  // AI speaks immediately
```

---

## Session Options

| Option | Default | Description |
|--------|---------|-------------|
| `prePrompt` | - | System prompt |
| `language` | `"en-US"` | Language code |
| `pipeline_mode` | `Live` | `Live` (~300ms) or `Composed` (~1-2s) |
| `ai_speaks_first` | `false` | AI initiates (Live mode only) |
| `allow_harm_category` | `false` | Disable safety filters |
| `tools` | `[]` | Function definitions |

---

## Audio Utilities

```rust
use livespeech_sdk::{float32_to_int16, int16_to_bytes, wrap_pcm_in_wav};

let pcm = float32_to_int16(&float_samples);
let bytes = int16_to_bytes(&pcm);
let wav = wrap_pcm_in_wav(&bytes, 16000, 1, 16);
```

---

## Error Handling

```rust
match client.connect().await {
    Ok(()) => println!("Connected"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timed out"),
    Err(LiveSpeechError::NotConnected) => eprintln!("Not connected"),
    Err(e) => eprintln!("Error: {}", e),
}
```

---

## Regions

| Region | Code |
|--------|------|
| Seoul (Korea) | `Region::ApNortheast2` |

## License

MIT