LiveSpeech SDK for Rust

A Rust SDK for real-time speech-to-speech AI conversations.

Features

🎙️ Real-time Voice Conversations - Natural, low-latency voice interactions
🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
🔊 Streaming Audio - Send and receive audio in real-time
📝 Live Transcription - Get transcriptions of both user and AI speech
🔄 Auto-reconnection - Automatic recovery from network issues

Installation

Add to your Cargo.toml:

[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }

Quick Start

use livespeech_sdk::{Config, LiveSpeechClient, LiveSpeechEvent, Region, SessionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client
    let config = Config::builder()
        .region(Region::ApNortheast2)
        .api_key("your-api-key")
        .build()?;

    let client = LiveSpeechClient::new(config);

    // Subscribe to events
    let mut events = client.subscribe();
    tokio::spawn(async move {
        while let Ok(event) = events.recv().await {
            match event {
                LiveSpeechEvent::Ready(_) => println!("Ready for audio"),
                LiveSpeechEvent::UserTranscript(e) => println!("You: {}", e.text),
                LiveSpeechEvent::Response(e) => println!("AI: {}", e.text),
                LiveSpeechEvent::Audio(e) => {
                    // Play audio: e.data (PCM16), e.sample_rate (24000Hz)
                }
                LiveSpeechEvent::TurnComplete(_) => println!("AI finished"),
                LiveSpeechEvent::Error(e) => eprintln!("Error: {}", e.message),
                _ => {}
            }
        }
    });

    // Connect and start session
    client.connect().await?;
    client.start_session(Some(SessionConfig::new("You are a helpful assistant."))).await?;

    // Stream audio
    client.audio_start().await?;
    client.send_audio_chunk(&audio_data).await?;  // PCM16, 16kHz
    client.audio_end().await?;

    // Cleanup
    client.end_session().await?;
    client.disconnect().await;

    Ok(())
}

Audio Flow

connect() → start_session() → audio_start() → send_audio_chunk()* → audio_end() → end_session()
                                     ↓
                           send_system_message() (optional, during live session)
                           send_tool_response() (when toolCall received)

Step	Description
`connect()`	Establish connection
`start_session(config)`	Start conversation with optional system prompt
`audio_start()`	Begin audio streaming
`send_audio_chunk(data)`	Send PCM16 audio (call multiple times)
`send_system_message(msg)`	Inject context or trigger AI response (optional)
`send_tool_response(id, result)`	Send function result back to AI (after toolCall)
`audio_end()`	End streaming, triggers AI response
`end_session()`	End conversation
`disconnect()`	Close connection

Configuration

let config = Config::builder()
    .region(Region::ApNortheast2)    // Required: Seoul region
    .api_key("your-api-key")          // Required: Your API key
    .user_id("user-123")              // Optional: Enable conversation memory
    .auto_reconnect(true)             // Auto-reconnect on disconnect
    .debug(false)                     // Enable debug logging
    .build()?;

let session = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR")           // Language: ko-KR, en-US, ja-JP, etc.
    .with_pipeline_mode(PipelineMode::Live)  // 'live' (default) or 'composed'
    .with_ai_speaks_first(false)      // AI speaks first (live mode only)
    .with_allow_harm_category(false)  // Disable safety filtering (use with caution)
    .with_tools(vec![tool]);          // Function calling

Session Options

Option	Type	Default	Description
`prePrompt`	`&str`	-	System prompt for the AI assistant
`language`	`&str`	`"en-US"`	Language code (e.g., `ko-KR`, `ja-JP`)
`pipeline_mode`	`PipelineMode`	`Live`	Audio processing mode
`ai_speaks_first`	`bool`	`false`	AI initiates conversation (live mode only)
`allow_harm_category`	`bool`	`false`	Disable content safety filtering
`tools`	`Vec<Tool>`	`vec![]`	Function definitions for AI to call

Pipeline Modes

Mode	Latency	Description
`Live`	Lower (~300ms)	Direct audio-to-audio via Live API
`Composed`	Higher (~1-2s)	Separate STT → LLM → TTS pipeline

AI Speaks First

When ai_speaks_first(true), the AI will immediately speak a greeting based on your prePrompt:

let session = SessionConfig::new(
    "You are a customer service agent. Greet the customer warmly and ask how you can help."
)
.with_ai_speaks_first(true);

client.start_session(Some(session)).await?;
client.audio_start().await?;  // AI greeting plays immediately

⚠️ Note: Only works with PipelineMode::Live

Content Safety

By default, LLM applies content safety filtering. Set allow_harm_category(true) to disable:

let session = SessionConfig::new("You are an assistant.")
    .with_allow_harm_category(true);  // ⚠️ Disables all safety filters

⚠️ Warning: Only use in controlled environments where content moderation is handled by other means.

Function Calling (Tool Use)

Define functions that the AI can call during conversation. When the AI decides to call a function, you receive a ToolCall event and must respond with send_tool_response().

Define Tools

use livespeech_sdk::{Tool, FunctionParameters};

let tools = vec![
    Tool {
        name: "open_login".to_string(),
        description: "Opens Google Login popup when user wants to sign in".to_string(),
        parameters: Some(FunctionParameters {
            r#type: "OBJECT".to_string(),
            properties: serde_json::json!({}),
            required: vec![],
        }),
    },
    Tool {
        name: "get_price".to_string(),
        description: "Gets product price by ID".to_string(),
        parameters: Some(FunctionParameters {
            r#type: "OBJECT".to_string(),
            properties: serde_json::json!({
                "productId": { "type": "string", "description": "Product ID" }
            }),
            required: vec!["productId".to_string()],
        }),
    },
];

let session = SessionConfig::new("You are a helpful assistant. Use tools when appropriate.")
    .with_tools(tools);

client.start_session(Some(session)).await?;

Handle Tool Calls

let mut events = client.subscribe();

tokio::spawn(async move {
    while let Ok(event) = events.recv().await {
        if let LiveSpeechEvent::ToolCall(e) = event {
            println!("AI wants to call: {}", e.name);
            println!("With arguments: {:?}", e.args);
            
            let result = match e.name.as_str() {
                "open_login" => {
                    show_login_modal();
                    serde_json::json!({ "success": true })
                }
                "get_price" => {
                    let product_id = e.args["productId"].as_str().unwrap_or("");
                    let price = get_product_price(product_id);
                    serde_json::json!({ "price": price, "currency": "USD" })
                }
                _ => serde_json::json!({ "error": "Unknown function" })
            };
            
            client.send_tool_response(&e.id, result).await.ok();
        }
    }
});

⚠️ Note: Function calling only works with PipelineMode::Live

System Messages

During an active live session, you can inject text messages to the AI using send_system_message(). This is useful for:

Game events ("User completed level 5, congratulate them!")
App state changes ("User opened the cart with 3 items")
Timer/engagement triggers ("User has been quiet, engage them")
External data updates ("Weather changed to rainy")

Usage

// Simple usage - AI responds immediately
client.send_system_message("User just completed level 5. Congratulate them!").await?;

// With trigger_response option - context only, no immediate response
client.send_system_message_with_options("User is browsing the cart", false).await?;

Methods

Method	Description
`send_system_message(text)`	Send message, AI responds immediately
`send_system_message_with_options(text, trigger_response)`	Send with explicit trigger option

Parameters

Parameter	Type	Required	Default	Description
`text`	`&str`	Yes	-	Message text (max 500 chars)
`trigger_response`	`bool`	No	`true`	AI responds immediately if `true`

⚠️ Note: Requires an active live session (audio_start() must have been called). Only works with PipelineMode::Live.

Conversation Memory

When you provide a user_id, the SDK enables persistent conversation memory:

Entity Memory: AI remembers facts shared in previous sessions (names, preferences, relationships)
Session Summaries: Recent conversation summaries are available to the AI
Cross-Session: Memory persists across sessions for the same user_id

// With memory (authenticated user)
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    .user_id("user-123")  // Enables conversation memory
    .build()?;

// Without memory (guest)
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    // No user_id = guest mode, no persistent memory
    .build()?;

Mode	Memory Persistence	Use Case
With `user_id`	Permanent	Authenticated users
Without `user_id`	Session only	Guests, anonymous users

Events

Event	Description	Key Fields
`Connected`	Connection established	`connection_id`
`Disconnected`	Connection closed	`reason`
`SessionStarted`	Session created	`session_id`
`Ready`	Ready for audio input	-
`UserTranscript`	Your speech transcribed	`text`
`Response`	AI's response text	`text`, `is_final`
`Audio`	AI's audio output	`data`, `sample_rate`
`TurnComplete`	AI finished speaking	-
`ToolCall`	AI wants to call a function	`id`, `name`, `args`
`Error`	Error occurred	`code`, `message`

Event Subscription

let mut events = client.subscribe();

tokio::spawn(async move {
    while let Ok(event) = events.recv().await {
        match event {
            LiveSpeechEvent::UserTranscript(e) => {
                println!("You said: {}", e.text);
            }
            LiveSpeechEvent::Response(e) => {
                println!("AI: {} (final: {})", e.text, e.is_final);
            }
            LiveSpeechEvent::Audio(e) => {
                // e.data: Vec<u8> - PCM16 audio
                // e.sample_rate: u32 - 24000 Hz
                play_audio(&e.data, e.sample_rate);
            }
            LiveSpeechEvent::TurnComplete(_) => {
                println!("AI finished responding");
            }
            LiveSpeechEvent::ToolCall(e) => {
                // e.id: String - use with send_tool_response
                // e.name: String - function name
                // e.args: Value - function arguments
                let result = handle_tool_call(&e.name, &e.args);
                client.send_tool_response(&e.id, result).await.ok();
            }
            LiveSpeechEvent::Error(e) => {
                eprintln!("Error [{:?}]: {}", e.code, e.message);
            }
            _ => {}
        }
    }
});

Convenience Handlers

// AI's text response
client.on_response(|text, is_final| {
    println!("AI: {}", text);
}).await;

// AI's audio output
client.on_audio(|audio_data| {
    play_audio(audio_data);
}).await;

// Error handling
client.on_error(|error| {
    eprintln!("Error: {}", error.message);
}).await;

Audio Format

Input (Your Microphone)

Property	Value
Format	PCM16 (16-bit signed, little-endian)
Sample Rate	16,000 Hz
Channels	1 (Mono)
Chunk Size	~3200 bytes (100ms)

Output (AI Response)

Property	Value
Format	PCM16 (16-bit signed, little-endian)
Sample Rate	24,000 Hz
Channels	1 (Mono)

Audio Utilities

use livespeech_sdk::{
    encode_to_base64, decode_from_base64,
    float32_to_int16, int16_to_float32,
    int16_to_bytes, bytes_to_int16,
    wrap_pcm_in_wav, AudioEncoder,
};

// Convert f32 samples to PCM16 bytes
let pcm_samples = float32_to_int16(&float_samples);
let pcm_bytes = int16_to_bytes(&pcm_samples);

// Create WAV file
let wav = wrap_pcm_in_wav(&pcm_bytes, 16000, 1, 16);

Error Handling

match client.connect().await {
    Ok(()) => println!("Connected"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timed out"),
    Err(LiveSpeechError::NotConnected) => eprintln!("Not connected"),
    Err(e) => eprintln!("Error: {}", e),
}

Regions

Region	Code	Location
Asia Pacific (Seoul)	`Region::ApNortheast2`	Korea

License

MIT

livespeech-sdk 0.1.8