livespeech-sdk 0.1.8

Real-time speech-to-speech AI conversation SDK
Documentation

LiveSpeech SDK for Rust

Crates.io Documentation License: MIT

A Rust SDK for real-time speech-to-speech AI conversations.

Features

  • 🎙️ Real-time Voice Conversations - Natural, low-latency voice interactions
  • 🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
  • 🔊 Streaming Audio - Send and receive audio in real-time
  • 📝 Live Transcription - Get transcriptions of both user and AI speech
  • 🔄 Auto-reconnection - Automatic recovery from network issues

Installation

Add to your Cargo.toml:

[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }

Quick Start

use livespeech_sdk::{Config, LiveSpeechClient, LiveSpeechEvent, Region, SessionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client
    let config = Config::builder()
        .region(Region::ApNortheast2)
        .api_key("your-api-key")
        .build()?;

    let client = LiveSpeechClient::new(config);

    // Subscribe to events
    let mut events = client.subscribe();
    tokio::spawn(async move {
        while let Ok(event) = events.recv().await {
            match event {
                LiveSpeechEvent::Ready(_) => println!("Ready for audio"),
                LiveSpeechEvent::UserTranscript(e) => println!("You: {}", e.text),
                LiveSpeechEvent::Response(e) => println!("AI: {}", e.text),
                LiveSpeechEvent::Audio(e) => {
                    // Play audio: e.data (PCM16), e.sample_rate (24000Hz)
                }
                LiveSpeechEvent::TurnComplete(_) => println!("AI finished"),
                LiveSpeechEvent::Error(e) => eprintln!("Error: {}", e.message),
                _ => {}
            }
        }
    });

    // Connect and start session
    client.connect().await?;
    client.start_session(Some(SessionConfig::new("You are a helpful assistant."))).await?;

    // Stream audio
    client.audio_start().await?;
    client.send_audio_chunk(&audio_data).await?;  // PCM16, 16kHz
    client.audio_end().await?;

    // Cleanup
    client.end_session().await?;
    client.disconnect().await;

    Ok(())
}

Audio Flow

connect() → start_session() → audio_start() → send_audio_chunk()* → audio_end() → end_session()
                                     ↓
                           send_system_message() (optional, during live session)
                           send_tool_response() (when toolCall received)
Step Description
connect() Establish connection
start_session(config) Start conversation with optional system prompt
audio_start() Begin audio streaming
send_audio_chunk(data) Send PCM16 audio (call multiple times)
send_system_message(msg) Inject context or trigger AI response (optional)
send_tool_response(id, result) Send function result back to AI (after toolCall)
audio_end() End streaming, triggers AI response
end_session() End conversation
disconnect() Close connection

Configuration

let config = Config::builder()
    .region(Region::ApNortheast2)    // Required: Seoul region
    .api_key("your-api-key")          // Required: Your API key
    .user_id("user-123")              // Optional: Enable conversation memory
    .auto_reconnect(true)             // Auto-reconnect on disconnect
    .debug(false)                     // Enable debug logging
    .build()?;

let session = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR")           // Language: ko-KR, en-US, ja-JP, etc.
    .with_pipeline_mode(PipelineMode::Live)  // 'live' (default) or 'composed'
    .with_ai_speaks_first(false)      // AI speaks first (live mode only)
    .with_allow_harm_category(false)  // Disable safety filtering (use with caution)
    .with_tools(vec![tool]);          // Function calling

Session Options

Option Type Default Description
prePrompt &str - System prompt for the AI assistant
language &str "en-US" Language code (e.g., ko-KR, ja-JP)
pipeline_mode PipelineMode Live Audio processing mode
ai_speaks_first bool false AI initiates conversation (live mode only)
allow_harm_category bool false Disable content safety filtering
tools Vec<Tool> vec![] Function definitions for AI to call

Pipeline Modes

Mode Latency Description
Live Lower (~300ms) Direct audio-to-audio via Live API
Composed Higher (~1-2s) Separate STT → LLM → TTS pipeline

AI Speaks First

When ai_speaks_first(true), the AI will immediately speak a greeting based on your prePrompt:

let session = SessionConfig::new(
    "You are a customer service agent. Greet the customer warmly and ask how you can help."
)
.with_ai_speaks_first(true);

client.start_session(Some(session)).await?;
client.audio_start().await?;  // AI greeting plays immediately

⚠️ Note: Only works with PipelineMode::Live

Content Safety

By default, LLM applies content safety filtering. Set allow_harm_category(true) to disable:

let session = SessionConfig::new("You are an assistant.")
    .with_allow_harm_category(true);  // ⚠️ Disables all safety filters

⚠️ Warning: Only use in controlled environments where content moderation is handled by other means.

Function Calling (Tool Use)

Define functions that the AI can call during conversation. When the AI decides to call a function, you receive a ToolCall event and must respond with send_tool_response().

Define Tools

use livespeech_sdk::{Tool, FunctionParameters};

let tools = vec![
    Tool {
        name: "open_login".to_string(),
        description: "Opens Google Login popup when user wants to sign in".to_string(),
        parameters: Some(FunctionParameters {
            r#type: "OBJECT".to_string(),
            properties: serde_json::json!({}),
            required: vec![],
        }),
    },
    Tool {
        name: "get_price".to_string(),
        description: "Gets product price by ID".to_string(),
        parameters: Some(FunctionParameters {
            r#type: "OBJECT".to_string(),
            properties: serde_json::json!({
                "productId": { "type": "string", "description": "Product ID" }
            }),
            required: vec!["productId".to_string()],
        }),
    },
];

let session = SessionConfig::new("You are a helpful assistant. Use tools when appropriate.")
    .with_tools(tools);

client.start_session(Some(session)).await?;

Handle Tool Calls

let mut events = client.subscribe();

tokio::spawn(async move {
    while let Ok(event) = events.recv().await {
        if let LiveSpeechEvent::ToolCall(e) = event {
            println!("AI wants to call: {}", e.name);
            println!("With arguments: {:?}", e.args);
            
            let result = match e.name.as_str() {
                "open_login" => {
                    show_login_modal();
                    serde_json::json!({ "success": true })
                }
                "get_price" => {
                    let product_id = e.args["productId"].as_str().unwrap_or("");
                    let price = get_product_price(product_id);
                    serde_json::json!({ "price": price, "currency": "USD" })
                }
                _ => serde_json::json!({ "error": "Unknown function" })
            };
            
            client.send_tool_response(&e.id, result).await.ok();
        }
    }
});

⚠️ Note: Function calling only works with PipelineMode::Live

System Messages

During an active live session, you can inject text messages to the AI using send_system_message(). This is useful for:

  • Game events ("User completed level 5, congratulate them!")
  • App state changes ("User opened the cart with 3 items")
  • Timer/engagement triggers ("User has been quiet, engage them")
  • External data updates ("Weather changed to rainy")

Usage

// Simple usage - AI responds immediately
client.send_system_message("User just completed level 5. Congratulate them!").await?;

// With trigger_response option - context only, no immediate response
client.send_system_message_with_options("User is browsing the cart", false).await?;

Methods

Method Description
send_system_message(text) Send message, AI responds immediately
send_system_message_with_options(text, trigger_response) Send with explicit trigger option

Parameters

Parameter Type Required Default Description
text &str Yes - Message text (max 500 chars)
trigger_response bool No true AI responds immediately if true

⚠️ Note: Requires an active live session (audio_start() must have been called). Only works with PipelineMode::Live.

Conversation Memory

When you provide a user_id, the SDK enables persistent conversation memory:

  • Entity Memory: AI remembers facts shared in previous sessions (names, preferences, relationships)
  • Session Summaries: Recent conversation summaries are available to the AI
  • Cross-Session: Memory persists across sessions for the same user_id
// With memory (authenticated user)
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    .user_id("user-123")  // Enables conversation memory
    .build()?;

// Without memory (guest)
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    // No user_id = guest mode, no persistent memory
    .build()?;
Mode Memory Persistence Use Case
With user_id Permanent Authenticated users
Without user_id Session only Guests, anonymous users

Events

Event Description Key Fields
Connected Connection established connection_id
Disconnected Connection closed reason
SessionStarted Session created session_id
Ready Ready for audio input -
UserTranscript Your speech transcribed text
Response AI's response text text, is_final
Audio AI's audio output data, sample_rate
TurnComplete AI finished speaking -
ToolCall AI wants to call a function id, name, args
Error Error occurred code, message

Event Subscription

let mut events = client.subscribe();

tokio::spawn(async move {
    while let Ok(event) = events.recv().await {
        match event {
            LiveSpeechEvent::UserTranscript(e) => {
                println!("You said: {}", e.text);
            }
            LiveSpeechEvent::Response(e) => {
                println!("AI: {} (final: {})", e.text, e.is_final);
            }
            LiveSpeechEvent::Audio(e) => {
                // e.data: Vec<u8> - PCM16 audio
                // e.sample_rate: u32 - 24000 Hz
                play_audio(&e.data, e.sample_rate);
            }
            LiveSpeechEvent::TurnComplete(_) => {
                println!("AI finished responding");
            }
            LiveSpeechEvent::ToolCall(e) => {
                // e.id: String - use with send_tool_response
                // e.name: String - function name
                // e.args: Value - function arguments
                let result = handle_tool_call(&e.name, &e.args);
                client.send_tool_response(&e.id, result).await.ok();
            }
            LiveSpeechEvent::Error(e) => {
                eprintln!("Error [{:?}]: {}", e.code, e.message);
            }
            _ => {}
        }
    }
});

Convenience Handlers

// AI's text response
client.on_response(|text, is_final| {
    println!("AI: {}", text);
}).await;

// AI's audio output
client.on_audio(|audio_data| {
    play_audio(audio_data);
}).await;

// Error handling
client.on_error(|error| {
    eprintln!("Error: {}", error.message);
}).await;

Audio Format

Input (Your Microphone)

Property Value
Format PCM16 (16-bit signed, little-endian)
Sample Rate 16,000 Hz
Channels 1 (Mono)
Chunk Size ~3200 bytes (100ms)

Output (AI Response)

Property Value
Format PCM16 (16-bit signed, little-endian)
Sample Rate 24,000 Hz
Channels 1 (Mono)

Audio Utilities

use livespeech_sdk::{
    encode_to_base64, decode_from_base64,
    float32_to_int16, int16_to_float32,
    int16_to_bytes, bytes_to_int16,
    wrap_pcm_in_wav, AudioEncoder,
};

// Convert f32 samples to PCM16 bytes
let pcm_samples = float32_to_int16(&float_samples);
let pcm_bytes = int16_to_bytes(&pcm_samples);

// Create WAV file
let wav = wrap_pcm_in_wav(&pcm_bytes, 16000, 1, 16);

Error Handling

match client.connect().await {
    Ok(()) => println!("Connected"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timed out"),
    Err(LiveSpeechError::NotConnected) => eprintln!("Not connected"),
    Err(e) => eprintln!("Error: {}", e),
}

Regions

Region Code Location
Asia Pacific (Seoul) Region::ApNortheast2 Korea

License

MIT