LiveSpeech SDK for Rust
A Rust SDK for real-time speech-to-speech AI conversations.
Features
- 🎙️ Real-time Voice Conversations - Natural, low-latency voice interactions
- 🌐 Multi-language Support - Korean, English, Japanese, Chinese, and more
- 🔊 Streaming Audio - Send and receive audio in real-time
- 📝 Live Transcription - Get transcriptions of both user and AI speech
- 🔄 Auto-reconnection - Automatic recovery from network issues
Installation
Add to your Cargo.toml:
[]
= "0.1"
= { = "1.35", = ["full"] }
Quick Start
use ;
async
Audio Flow
connect() → start_session() → audio_start() → send_audio_chunk()* → audio_end() → end_session()
↓
send_system_message() (optional, during live session)
| Step | Description |
|---|---|
connect() |
Establish connection |
start_session(config) |
Start conversation with optional system prompt |
audio_start() |
Begin audio streaming |
send_audio_chunk(data) |
Send PCM16 audio (call multiple times) |
send_system_message(msg) |
Inject context or trigger AI response (optional) |
audio_end() |
End streaming, triggers AI response |
end_session() |
End conversation |
disconnect() |
Close connection |
Configuration
let config = builder
.region // Required: Seoul region
.api_key // Required: Your API key
.user_id // Optional: Enable conversation memory
.auto_reconnect // Auto-reconnect on disconnect
.debug // Enable debug logging
.build?;
let session = new
.with_language // Language: ko-KR, en-US, ja-JP, etc.
.with_pipeline_mode // 'live' (default) or 'composed'
.with_ai_speaks_first // AI speaks first (live mode only)
.with_allow_harm_category; // Disable safety filtering (use with caution)
Session Options
| Option | Type | Default | Description |
|---|---|---|---|
prePrompt |
&str |
- | System prompt for the AI assistant |
language |
&str |
"en-US" |
Language code (e.g., ko-KR, ja-JP) |
pipeline_mode |
PipelineMode |
Live |
Audio processing mode |
ai_speaks_first |
bool |
false |
AI initiates conversation (live mode only) |
allow_harm_category |
bool |
false |
Disable content safety filtering |
Pipeline Modes
| Mode | Latency | Description |
|---|---|---|
Live |
Lower (~300ms) | Direct audio-to-audio via Live API |
Composed |
Higher (~1-2s) | Separate STT → LLM → TTS pipeline |
AI Speaks First
When ai_speaks_first(true), the AI will immediately speak a greeting based on your prePrompt:
let session = new
.with_ai_speaks_first;
client.start_session.await?;
client.audio_start.await?; // AI greeting plays immediately
⚠️ Note: Only works with
PipelineMode::Live
Content Safety
By default, LLM applies content safety filtering. Set allow_harm_category(true) to disable:
let session = new
.with_allow_harm_category; // ⚠️ Disables all safety filters
⚠️ Warning: Only use in controlled environments where content moderation is handled by other means.
System Messages
During an active live session, you can inject text messages to the AI using send_system_message(). This is useful for:
- Game events ("User completed level 5, congratulate them!")
- App state changes ("User opened the cart with 3 items")
- Timer/engagement triggers ("User has been quiet, engage them")
- External data updates ("Weather changed to rainy")
Usage
// Simple usage - AI responds immediately
client.send_system_message.await?;
// With trigger_response option - context only, no immediate response
client.send_system_message_with_options.await?;
Methods
| Method | Description |
|---|---|
send_system_message(text) |
Send message, AI responds immediately |
send_system_message_with_options(text, trigger_response) |
Send with explicit trigger option |
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
&str |
Yes | - | Message text (max 500 chars) |
trigger_response |
bool |
No | true |
AI responds immediately if true |
⚠️ Note: Requires an active live session (
audio_start()must have been called). Only works withPipelineMode::Live.
Conversation Memory
When you provide a user_id, the SDK enables persistent conversation memory:
- Entity Memory: AI remembers facts shared in previous sessions (names, preferences, relationships)
- Session Summaries: Recent conversation summaries are available to the AI
- Cross-Session: Memory persists across sessions for the same
user_id
// With memory (authenticated user)
let config = builder
.region
.api_key
.user_id // Enables conversation memory
.build?;
// Without memory (guest)
let config = builder
.region
.api_key
// No user_id = guest mode, no persistent memory
.build?;
| Mode | Memory Persistence | Use Case |
|---|---|---|
With user_id |
Permanent | Authenticated users |
Without user_id |
Session only | Guests, anonymous users |
Events
| Event | Description | Key Fields |
|---|---|---|
Connected |
Connection established | connection_id |
Disconnected |
Connection closed | reason |
SessionStarted |
Session created | session_id |
Ready |
Ready for audio input | - |
UserTranscript |
Your speech transcribed | text |
Response |
AI's response text | text, is_final |
Audio |
AI's audio output | data, sample_rate |
TurnComplete |
AI finished speaking | - |
Error |
Error occurred | code, message |
Event Subscription
let mut events = client.subscribe;
spawn;
Convenience Handlers
// AI's text response
client.on_response.await;
// AI's audio output
client.on_audio.await;
// Error handling
client.on_error.await;
Audio Format
Input (Your Microphone)
| Property | Value |
|---|---|
| Format | PCM16 (16-bit signed, little-endian) |
| Sample Rate | 16,000 Hz |
| Channels | 1 (Mono) |
| Chunk Size | ~3200 bytes (100ms) |
Output (AI Response)
| Property | Value |
|---|---|
| Format | PCM16 (16-bit signed, little-endian) |
| Sample Rate | 24,000 Hz |
| Channels | 1 (Mono) |
Audio Utilities
use ;
// Convert f32 samples to PCM16 bytes
let pcm_samples = float32_to_int16;
let pcm_bytes = int16_to_bytes;
// Create WAV file
let wav = wrap_pcm_in_wav;
Error Handling
match client.connect.await
Regions
| Region | Code | Location |
|---|---|---|
| Asia Pacific (Seoul) | Region::ApNortheast2 |
Korea |
License
MIT