LiveSpeech SDK for Rust

A Rust SDK for real-time speech-to-speech AI conversations.

Features

🎙️ Real-time Voice Conversations - Natural, low-latency voice interactions
🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
🔊 Streaming Audio - Send and receive audio in real-time
📝 Live Transcription - Get transcriptions of both user and AI speech
🔄 Auto-reconnection - Automatic recovery from network issues

Installation

Add to your Cargo.toml:

[dependencies]
livespeech-sdk = "0.1"
tokio = { version = "1.35", features = ["full"] }

Quick Start

use livespeech_sdk::{Config, LiveSpeechClient, LiveSpeechEvent, Region, SessionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create client
    let config = Config::builder()
        .region(Region::ApNortheast2)
        .api_key("your-api-key")
        .build()?;

    let client = LiveSpeechClient::new(config);

    // Subscribe to events
    let mut events = client.subscribe();
    tokio::spawn(async move {
        while let Ok(event) = events.recv().await {
            match event {
                LiveSpeechEvent::Ready(_) => println!("Ready for audio"),
                LiveSpeechEvent::UserTranscript(e) => println!("You: {}", e.text),
                LiveSpeechEvent::Response(e) => println!("AI: {}", e.text),
                LiveSpeechEvent::Audio(e) => {
                    // Play audio: e.data (PCM16), e.sample_rate (24000Hz)
                }
                LiveSpeechEvent::TurnComplete(_) => println!("AI finished"),
                LiveSpeechEvent::Error(e) => eprintln!("Error: {}", e.message),
                _ => {}
            }
        }
    });

    // Connect and start session
    client.connect().await?;
    client.start_session(Some(SessionConfig::new("You are a helpful assistant."))).await?;

    // Stream audio
    client.audio_start().await?;
    client.send_audio_chunk(&audio_data).await?;  // PCM16, 16kHz
    client.audio_end().await?;

    // Cleanup
    client.end_session().await?;
    client.disconnect().await;

    Ok(())
}

Audio Flow

connect() → start_session() → audio_start() → send_audio_chunk()* → audio_end() → end_session()

Step	Description
`connect()`	Establish connection
`start_session(config)`	Start conversation with optional system prompt
`audio_start()`	Begin audio streaming
`send_audio_chunk(data)`	Send PCM16 audio (call multiple times)
`audio_end()`	End streaming, triggers AI response
`end_session()`	End conversation
`disconnect()`	Close connection

Configuration

let config = Config::builder()
    .region(Region::ApNortheast2)    // Required: Seoul region
    .api_key("your-api-key")          // Required: Your API key
    .auto_reconnect(true)             // Auto-reconnect on disconnect
    .debug(false)                     // Enable debug logging
    .build()?;

let session = SessionConfig::new("You are a helpful assistant.")
    .with_language("ko-KR");          // Language: ko-KR, en-US, ja-JP, etc.

Events

Event	Description	Key Fields
`Connected`	Connection established	`connection_id`
`Disconnected`	Connection closed	`reason`
`SessionStarted`	Session created	`session_id`
`Ready`	Ready for audio input	-
`UserTranscript`	Your speech transcribed	`text`
`Response`	AI's response text	`text`, `is_final`
`Audio`	AI's audio output	`data`, `sample_rate`
`TurnComplete`	AI finished speaking	-
`Error`	Error occurred	`code`, `message`

Event Subscription

let mut events = client.subscribe();

tokio::spawn(async move {
    while let Ok(event) = events.recv().await {
        match event {
            LiveSpeechEvent::UserTranscript(e) => {
                println!("You said: {}", e.text);
            }
            LiveSpeechEvent::Response(e) => {
                println!("AI: {} (final: {})", e.text, e.is_final);
            }
            LiveSpeechEvent::Audio(e) => {
                // e.data: Vec<u8> - PCM16 audio
                // e.sample_rate: u32 - 24000 Hz
                play_audio(&e.data, e.sample_rate);
            }
            LiveSpeechEvent::TurnComplete(_) => {
                println!("AI finished responding");
            }
            LiveSpeechEvent::Error(e) => {
                eprintln!("Error [{:?}]: {}", e.code, e.message);
            }
            _ => {}
        }
    }
});

Convenience Handlers

// AI's text response
client.on_response(|text, is_final| {
    println!("AI: {}", text);
}).await;

// AI's audio output
client.on_audio(|audio_data| {
    play_audio(audio_data);
}).await;

// Error handling
client.on_error(|error| {
    eprintln!("Error: {}", error.message);
}).await;

Audio Format

Input (Your Microphone)

Property	Value
Format	PCM16 (16-bit signed, little-endian)
Sample Rate	16,000 Hz
Channels	1 (Mono)
Chunk Size	~3200 bytes (100ms)

Output (AI Response)

Property	Value
Format	PCM16 (16-bit signed, little-endian)
Sample Rate	24,000 Hz
Channels	1 (Mono)

Audio Utilities

use livespeech_sdk::{
    encode_to_base64, decode_from_base64,
    float32_to_int16, int16_to_float32,
    int16_to_bytes, bytes_to_int16,
    wrap_pcm_in_wav, AudioEncoder,
};

// Convert f32 samples to PCM16 bytes
let pcm_samples = float32_to_int16(&float_samples);
let pcm_bytes = int16_to_bytes(&pcm_samples);

// Create WAV file
let wav = wrap_pcm_in_wav(&pcm_bytes, 16000, 1, 16);

Error Handling

match client.connect().await {
    Ok(()) => println!("Connected"),
    Err(LiveSpeechError::ConnectionTimeout) => eprintln!("Timed out"),
    Err(LiveSpeechError::NotConnected) => eprintln!("Not connected"),
    Err(e) => eprintln!("Error: {}", e),
}

Regions

Region	Code	Location
Asia Pacific (Seoul)	`Region::ApNortheast2`	Korea

License

MIT

livespeech-sdk 0.1.3

LiveSpeech SDK for Rust

Features

Installation

Quick Start

Audio Flow

Configuration

Events

Event Subscription

Convenience Handlers

Audio Format

Input (Your Microphone)

Output (AI Response)

Audio Utilities

Error Handling

Regions

License