livespeech-sdk 0.1.2

Real-time speech-to-speech AI conversation SDK
Documentation

LiveSpeech SDK for Rust

Crates.io Documentation License: MIT

A Rust SDK for real-time speech-to-speech AI conversations.

Installation

Add to your Cargo.toml:

[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }

Quick Start

use livespeech_sdk::{Config, LiveSpeechClient, Region, SessionConfig, PipelineMode};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create configuration using region
    let config = Config::builder()
        .region(Region::ApNortheast2)  // Asia Pacific (Seoul)
        .api_key("your-api-key")
        .build()?;

    // Create client
    let client = LiveSpeechClient::new(config);

    // Set up event handlers
    client.on_user_transcript(|text| {
        println!("You said: {}", text);
    }).await;

    client.on_transcript(|text, is_final| {
        println!("AI: {} (final: {})", text, is_final);
    }).await;

    client.on_audio(|audio_data| {
        // Play audio through speakers
    }).await;

    // Connect and start session
    client.connect().await?;
    
    // Uses Gemini Live by default
    let session_config = SessionConfig::new("You are a helpful assistant.");
    client.start_session(session_config).await?;

    // Start streaming and send audio
    client.audio_start().await?;
    client.send_audio_chunk(&audio_data).await?;

    Ok(())
}

Pipeline Modes

The SDK supports two pipeline modes for audio processing:

Live Mode (Default)

Uses Gemini 2.5 Flash Live API for end-to-end audio conversation:

  • Lower latency - Direct audio-to-audio processing
  • Natural conversation - Built-in voice activity detection
  • Real-time transcription - Both user and AI speech transcribed
let session_config = SessionConfig::new("You are a helpful assistant.")
    .with_pipeline_mode(PipelineMode::Live);  // Default, can be omitted

Composed Mode

Uses separate STT + LLM + TTS services for more customization:

  • More control - Separate services for each step
  • Custom voices - Use different TTS voices
  • Text responses - Access to intermediate text responses
let session_config = SessionConfig::new("You are a helpful assistant.")
    .with_pipeline_mode(PipelineMode::Composed);

API Reference

Regions

The SDK provides built-in region support, so you don't need to remember endpoint URLs:

Region Variant Location
ap-northeast-2 Region::ApNortheast2 Asia Pacific (Seoul)
us-west-2 Region::UsWest2 US West (Oregon) - Coming soon

Config

Use the builder pattern to create configuration:

let config = Config::builder()
    .region(Region::ApNortheast2)    // Required
    .api_key("...")                  // Required
    .connection_timeout(Duration::from_secs(30))
    .auto_reconnect(true)
    .max_reconnect_attempts(5)
    .reconnect_delay(Duration::from_secs(1))
    .debug(false)
    .build()?;

LiveSpeechClient

Methods

Method Description
connect() Connect to the server
disconnect() Disconnect from the server
start_session(config) Start a conversation session
end_session() End the current session
send_audio(data, format) Send audio data to be transcribed
connection_state() Get current connection state
is_connected() Check if connected
has_active_session() Check if session is active

Event Handlers

// User transcript handler (user's speech)
client.on_user_transcript(|text| {
    println!("You said: {}", text);
}).await;

// Transcript handler (AI's speech transcription in live mode)
client.on_transcript(|text, is_final| {
    println!("AI transcript: {}", text);
}).await;

// Response handler (AI text response in composed mode)
client.on_response(|text, is_final| {
    println!("AI response: {}", text);
}).await;

// Audio handler (audio bytes)
client.on_audio(|audio_data| {
    // Process audio
}).await;

// Error handler (ErrorEvent)
client.on_error(|event| {
    eprintln!("Error: {}", event.message);
}).await;

SessionConfig

// Simple creation (uses Live mode by default)
let config = SessionConfig::new("You are a helpful assistant.");

// Builder pattern for more options
let config = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR")
    .with_pipeline_mode(PipelineMode::Composed);

// Empty config (uses defaults)
let config = SessionConfig::empty();
Option Type Default Description
pre_prompt Option<String> None System prompt for the AI
language Option<String> None Language code (e.g., "ko-KR")
pipeline_mode PipelineMode Live Audio processing mode

AudioFormat

pub enum AudioFormat {
    Pcm16,  // 16-bit PCM (default)
    Opus,   // Opus encoded
    Wav,    // WAV format
}

Audio Utilities

use livespeech_sdk::{
    encode_to_base64,
    decode_from_base64,
    float32_to_int16,
    int16_to_float32,
    wrap_pcm_in_wav,
    AudioEncoder,
};

// Convert f32 samples to i16 PCM
let pcm = float32_to_int16(&float_samples);

// Create WAV from PCM
let wav = wrap_pcm_in_wav(&pcm_bytes, 16000, 1, 16);

// Use AudioEncoder for convenience
let encoder = AudioEncoder::new();
let base64 = encoder.encode(&audio_bytes);
let decoded = encoder.decode(&base64)?;

Events

All events can be received through the event channel:

let mut events = client.events().await;

while let Some(event) = events.recv().await {
    match event {
        LiveSpeechEvent::Connected(e) => println!("Connected: {}", e.connection_id),
        LiveSpeechEvent::UserTranscript(e) => println!("You said: {}", e.text),
        LiveSpeechEvent::Transcript(e) => println!("AI transcript: {}", e.text),
        LiveSpeechEvent::Response(e) => println!("AI response: {}", e.text),
        LiveSpeechEvent::Audio(e) => { /* handle audio */ },
        LiveSpeechEvent::Ready(e) => println!("Ready for audio"),
        LiveSpeechEvent::TurnComplete(e) => println!("AI finished speaking"),
        LiveSpeechEvent::Error(e) => eprintln!("Error: {}", e.message),
        _ => {}
    }
}

Error Handling

The SDK uses a custom error type:

use livespeech_sdk::{LiveSpeechError, Result};

match client.connect().await {
    Ok(()) => println!("Connected!"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timeout"),
    Err(LiveSpeechError::AuthenticationFailed(msg)) => eprintln!("Auth failed: {}", msg),
    Err(e) => eprintln!("Error: {}", e),
}

License

MIT