livespeech-sdk 0.1.7

Real-time speech-to-speech AI conversation SDK
Documentation

LiveSpeech SDK for Rust

Crates.io Documentation License: MIT

A Rust SDK for real-time speech-to-speech AI conversations.

Features

  • 🎙️ Real-time Voice Conversations - Natural, low-latency voice interactions
  • 🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
  • 🔊 Streaming Audio - Send and receive audio in real-time
  • 📝 Live Transcription - Get transcriptions of both user and AI speech
  • 🔄 Auto-reconnection - Automatic recovery from network issues

Installation

Add to your Cargo.toml:

[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }

Quick Start

use livespeech_sdk::{Config, LiveSpeechClient, LiveSpeechEvent, Region, SessionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client
    let config = Config::builder()
        .region(Region::ApNortheast2)
        .api_key("your-api-key")
        .build()?;

    let client = LiveSpeechClient::new(config);

    // Subscribe to events
    let mut events = client.subscribe();
    tokio::spawn(async move {
        while let Ok(event) = events.recv().await {
            match event {
                LiveSpeechEvent::Ready(_) => println!("Ready for audio"),
                LiveSpeechEvent::UserTranscript(e) => println!("You: {}", e.text),
                LiveSpeechEvent::Response(e) => println!("AI: {}", e.text),
                LiveSpeechEvent::Audio(e) => {
                    // Play audio: e.data (PCM16), e.sample_rate (24000Hz)
                }
                LiveSpeechEvent::TurnComplete(_) => println!("AI finished"),
                LiveSpeechEvent::Error(e) => eprintln!("Error: {}", e.message),
                _ => {}
            }
        }
    });

    // Connect and start session
    client.connect().await?;
    client.start_session(Some(SessionConfig::new("You are a helpful assistant."))).await?;

    // Stream audio
    client.audio_start().await?;
    client.send_audio_chunk(&audio_data).await?;  // PCM16, 16kHz
    client.audio_end().await?;

    // Cleanup
    client.end_session().await?;
    client.disconnect().await;

    Ok(())
}

Audio Flow

connect() → start_session() → audio_start() → send_audio_chunk()* → audio_end() → end_session()
                                     ↓
                           send_system_message() (optional, during live session)
Step Description
connect() Establish connection
start_session(config) Start conversation with optional system prompt
audio_start() Begin audio streaming
send_audio_chunk(data) Send PCM16 audio (call multiple times)
send_system_message(msg) Inject context or trigger AI response (optional)
audio_end() End streaming, triggers AI response
end_session() End conversation
disconnect() Close connection

Configuration

let config = Config::builder()
    .region(Region::ApNortheast2)    // Required: Seoul region
    .api_key("your-api-key")          // Required: Your API key
    .user_id("user-123")              // Optional: Enable conversation memory
    .auto_reconnect(true)             // Auto-reconnect on disconnect
    .debug(false)                     // Enable debug logging
    .build()?;

let session = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR")           // Language: ko-KR, en-US, ja-JP, etc.
    .with_pipeline_mode(PipelineMode::Live)  // 'live' (default) or 'composed'
    .with_ai_speaks_first(false)      // AI speaks first (live mode only)
    .with_allow_harm_category(false); // Disable safety filtering (use with caution)

Session Options

Option Type Default Description
prePrompt &str - System prompt for the AI assistant
language &str "en-US" Language code (e.g., ko-KR, ja-JP)
pipeline_mode PipelineMode Live Audio processing mode
ai_speaks_first bool false AI initiates conversation (live mode only)
allow_harm_category bool false Disable content safety filtering

Pipeline Modes

Mode Latency Description
Live Lower (~300ms) Direct audio-to-audio via Live API
Composed Higher (~1-2s) Separate STT → LLM → TTS pipeline

AI Speaks First

When ai_speaks_first(true), the AI will immediately speak a greeting based on your prePrompt:

let session = SessionConfig::new(
    "You are a customer service agent. Greet the customer warmly and ask how you can help."
)
.with_ai_speaks_first(true);

client.start_session(Some(session)).await?;
client.audio_start().await?;  // AI greeting plays immediately

⚠️ Note: Only works with PipelineMode::Live

Content Safety

By default, LLM applies content safety filtering. Set allow_harm_category(true) to disable:

let session = SessionConfig::new("You are an assistant.")
    .with_allow_harm_category(true);  // ⚠️ Disables all safety filters

⚠️ Warning: Only use in controlled environments where content moderation is handled by other means.

System Messages

During an active live session, you can inject text messages to the AI using send_system_message(). This is useful for:

  • Game events ("User completed level 5, congratulate them!")
  • App state changes ("User opened the cart with 3 items")
  • Timer/engagement triggers ("User has been quiet, engage them")
  • External data updates ("Weather changed to rainy")

Usage

// Simple usage - AI responds immediately
client.send_system_message("User just completed level 5. Congratulate them!").await?;

// With trigger_response option - context only, no immediate response
client.send_system_message_with_options("User is browsing the cart", false).await?;

Methods

Method Description
send_system_message(text) Send message, AI responds immediately
send_system_message_with_options(text, trigger_response) Send with explicit trigger option

Parameters

Parameter Type Required Default Description
text &str Yes - Message text (max 500 chars)
trigger_response bool No true AI responds immediately if true

⚠️ Note: Requires an active live session (audio_start() must have been called). Only works with PipelineMode::Live.

Conversation Memory

When you provide a user_id, the SDK enables persistent conversation memory:

  • Entity Memory: AI remembers facts shared in previous sessions (names, preferences, relationships)
  • Session Summaries: Recent conversation summaries are available to the AI
  • Cross-Session: Memory persists across sessions for the same user_id
// With memory (authenticated user)
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    .user_id("user-123")  // Enables conversation memory
    .build()?;

// Without memory (guest)
let config = Config::builder()
    .region(Region::ApNortheast2)
    .api_key("your-api-key")
    // No user_id = guest mode, no persistent memory
    .build()?;
Mode Memory Persistence Use Case
With user_id Permanent Authenticated users
Without user_id Session only Guests, anonymous users

Events

Event Description Key Fields
Connected Connection established connection_id
Disconnected Connection closed reason
SessionStarted Session created session_id
Ready Ready for audio input -
UserTranscript Your speech transcribed text
Response AI's response text text, is_final
Audio AI's audio output data, sample_rate
TurnComplete AI finished speaking -
Error Error occurred code, message

Event Subscription

let mut events = client.subscribe();

tokio::spawn(async move {
    while let Ok(event) = events.recv().await {
        match event {
            LiveSpeechEvent::UserTranscript(e) => {
                println!("You said: {}", e.text);
            }
            LiveSpeechEvent::Response(e) => {
                println!("AI: {} (final: {})", e.text, e.is_final);
            }
            LiveSpeechEvent::Audio(e) => {
                // e.data: Vec<u8> - PCM16 audio
                // e.sample_rate: u32 - 24000 Hz
                play_audio(&e.data, e.sample_rate);
            }
            LiveSpeechEvent::TurnComplete(_) => {
                println!("AI finished responding");
            }
            LiveSpeechEvent::Error(e) => {
                eprintln!("Error [{:?}]: {}", e.code, e.message);
            }
            _ => {}
        }
    }
});

Convenience Handlers

// AI's text response
client.on_response(|text, is_final| {
    println!("AI: {}", text);
}).await;

// AI's audio output
client.on_audio(|audio_data| {
    play_audio(audio_data);
}).await;

// Error handling
client.on_error(|error| {
    eprintln!("Error: {}", error.message);
}).await;

Audio Format

Input (Your Microphone)

Property Value
Format PCM16 (16-bit signed, little-endian)
Sample Rate 16,000 Hz
Channels 1 (Mono)
Chunk Size ~3200 bytes (100ms)

Output (AI Response)

Property Value
Format PCM16 (16-bit signed, little-endian)
Sample Rate 24,000 Hz
Channels 1 (Mono)

Audio Utilities

use livespeech_sdk::{
    encode_to_base64, decode_from_base64,
    float32_to_int16, int16_to_float32,
    int16_to_bytes, bytes_to_int16,
    wrap_pcm_in_wav, AudioEncoder,
};

// Convert f32 samples to PCM16 bytes
let pcm_samples = float32_to_int16(&float_samples);
let pcm_bytes = int16_to_bytes(&pcm_samples);

// Create WAV file
let wav = wrap_pcm_in_wav(&pcm_bytes, 16000, 1, 16);

Error Handling

match client.connect().await {
    Ok(()) => println!("Connected"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timed out"),
    Err(LiveSpeechError::NotConnected) => eprintln!("Not connected"),
    Err(e) => eprintln!("Error: {}", e),
}

Regions

Region Code Location
Asia Pacific (Seoul) Region::ApNortheast2 Korea

License

MIT