Expand description
Β§VUIKit - Voice User Interface Toolkit
A Rust library for building real-time voice user interfaces with streaming support for Voice Activity Detection (VAD), Speech-to-Text (STT), and Text-to-Speech (TTS).
Β§Features
- π Stream-first: All operations are async stream-based for real-time processing
- β‘ Low latency: Designed for minimal delay in voice interactions
- π Pluggable backends: Easy to swap different VAD/STT/TTS implementations
- π‘ Real-time: Supports incremental results and continuous streaming
- π§© Channel-based: Components provide tokio channels for continuous processing
- π― Multi-capability: Backends can provide multiple capabilities (e.g., Whisper = VAD+STT)
Β§Installation
Add VUIKit to your Cargo.toml:
[dependencies]
vuikit = "0.1.0"
futures = "0.3"
tokio = { version = "1.0", features = ["sync", "macros", "rt"] }Β§Architecture
ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Your App β β Components β β Backends β
ββββββββββββββββββββ€ ββββββββββββββββββββ€ βββββββββββββββββββ€
β β’ Send audio βββββΆβ β’ Channel mgmt βββββΆβ β’ VAD algorithm β
β β’ Receive events ββββββ β’ Lifecycle ββββββ β’ STT model β
β β’ Business logic β β β’ Error handling β β β’ TTS engine β
ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββVUIKit provides two levels of abstraction:
- Backends: Trait definitions for implementing VAD/STT/TTS logic
- Components: Wrapper abstractions that provide channel-based continuous processing
Β§Quick Start
Β§Stream-based Processing (Low-level)
Direct backend usage for one-time processing:
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;
// Implement your VAD backend
struct MyVadBackend;
impl VadBackend for MyVadBackend {
type VadStream = Pin<Box<dyn Stream<Item = VuiResult<VadEvent>> + Send>>;
fn process_stream<S>(self, audio_stream: S) -> Self::VadStream
where S: Stream<Item = VuiResult<AudioChunk>> + Send + Unpin + 'static
{
Box::pin(audio_stream.map(|chunk| {
match chunk {
Ok(audio_chunk) => Ok(VadEvent::Silence { audio_chunk }),
Err(e) => Err(e),
}
}))
}
}
fn get_microphone_stream() -> impl Stream<Item = VuiResult<AudioChunk>> + Send + Unpin {
stream::empty()
}
#[tokio::main]
async fn main() -> VuiResult<()> {
let vad = MyVadBackend;
let audio_stream = get_microphone_stream();
let mut vad_events = vad.process_stream(audio_stream);
while let Some(event) = vad_events.next().await {
match event? {
VadEvent::VoiceStarted { confidence, .. } => {
println!("Voice detected! Confidence: {}", confidence);
}
VadEvent::VoiceEnded => {
println!("Voice stopped");
}
_ => {}
}
}
Ok(())
}Β§Channel-based Continuous Processing (High-level)
Components for long-running voice applications:
use vuikit::components::vad::VadComponent;
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;
// Hidden backend implementation for the example
struct MyVadBackend;
impl VadBackend for MyVadBackend {
type VadStream = Pin<Box<dyn Stream<Item = VuiResult<VadEvent>> + Send>>;
fn process_stream<S>(self, audio_stream: S) -> Self::VadStream
where S: Stream<Item = VuiResult<AudioChunk>> + Send + Unpin + 'static
{
Box::pin(audio_stream.map(|chunk| {
match chunk {
Ok(audio_chunk) => Ok(VadEvent::Silence { audio_chunk }),
Err(e) => Err(e),
}
}))
}
}
async fn capture_audio() -> AudioChunk {
AudioChunk::new(vec![0.0; 512], 16000, 1)
}
fn start_recording() {
println!("Started recording");
}
fn stop_recording() {
println!("Stopped recording");
}
#[tokio::main]
async fn main() {
// Wrap your backend in a component
let (vad_component, mut vad_channels) = VadComponent::new(MyVadBackend);
// Start component in background
tokio::spawn(async move {
vad_component.run().await.unwrap();
});
// Send audio continuously
for _ in 0..3 { // Limited loop for example
let audio_chunk = capture_audio().await;
vad_channels.audio_tx.send(audio_chunk).unwrap();
// Handle events as they arrive
while let Ok(event) = vad_channels.event_rx.try_recv() {
match event {
VadEvent::VoiceStarted { .. } => start_recording(),
VadEvent::VoiceEnded => stop_recording(),
_ => {}
}
}
}
}Β§Core Concepts
Β§Backends vs Components
-
Backends: Implement the actual VAD/STT/TTS algorithms
- Use for one-time processing
- Direct stream handling
- Full control over processing pipeline
- Consume
selfto avoid borrowing issues
-
Components: Wrapper abstractions around backends
- Use for long-running applications
- Channel-based architecture
- Background processing tasks
- Automatic lifecycle management
Β§Backend Traits
VUIKit provides three main backend traits:
use vuikit::backend::vad::VadBackend;
use vuikit::backend::stt::SttBackend;
use vuikit::backend::tts::TtsBackend;
// These traits are already defined in VUIKit:
// VadBackend, SttBackend, TtsBackend
// See the actual trait definitions in the API documentationΒ§Multi-capability Backends
Some backends provide multiple capabilities (e.g., OpenAI Whisper does both VAD and STT):
use vuikit::backend::vad::VadBackend;
use vuikit::backend::stt::SttBackend;
struct WhisperBackend {
// Backend-specific fields would go here
}
// A backend can implement multiple traits:
// impl VadBackend for WhisperBackend { /* VAD implementation */ }
// impl SttBackend for WhisperBackend { /* STT implementation */ }Β§Audio Data
All audio processing uses the AudioChunk type:
use vuikit::core::audio::AudioChunk;
let audio = AudioChunk::new(
vec![0.1, 0.2, 0.3], // samples as f32
16000, // sample rate
1 // channels
);Β§Error Handling
VUIKit uses a comprehensive error system:
use vuikit::core::error::{VuiError, VuiResult};
fn handle_result(result: VuiResult<()>) {
match result {
Ok(_) => println!("Success"),
Err(VuiError::BackendError(msg)) => eprintln!("Backend error: {}", msg),
Err(VuiError::StreamError(msg)) => eprintln!("Stream error: {}", msg),
Err(VuiError::InvalidAudioFormat(msg)) => eprintln!("Audio format error: {}", msg),
Err(VuiError::ConfigError(msg)) => eprintln!("Config error: {}", msg),
}
}Β§Example Backend Implementation
Hereβs a simple VAD backend implementation:
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::core::{audio::AudioChunk, error::VuiResult};
use futures::{Stream, StreamExt};
use std::pin::Pin;
pub struct SimpleVadBackend {
threshold: f32,
}
impl SimpleVadBackend {
pub fn new(threshold: f32) -> Self {
Self { threshold }
}
}
impl VadBackend for SimpleVadBackend {
type VadStream = Pin<Box<dyn Stream<Item = VuiResult<VadEvent>> + Send>>;
fn process_stream<S>(self, audio_stream: S) -> Self::VadStream
where S: Stream<Item = VuiResult<AudioChunk>> + Send + Unpin + 'static
{
Box::pin(audio_stream.map(move |chunk_result| {
let chunk = chunk_result?;
// Simple energy-based VAD
let energy: f32 = chunk.samples.iter()
.map(|&sample| sample * sample)
.sum::<f32>() / chunk.samples.len() as f32;
if energy > self.threshold {
Ok(VadEvent::VoiceStarted {
confidence: (energy / self.threshold).min(1.0),
audio_chunk: chunk
})
} else {
Ok(VadEvent::Silence { audio_chunk: chunk })
}
}))
}
}Β§When to Use What
Β§Use Backends Directly When:
- One-time processing tasks
- Custom stream handling
- Full control over processing pipeline
- Building your own abstractions
Β§Use Components When:
- Long-running voice applications (chatbots, assistants)
- Channel-based architecture
- Background processing tasks
- Automatic lifecycle management
Β§Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Β§License
This project is licensed under the MIT License - see the LICENSE file for details.
Β§API Examples
Β§Stream-based Processing (Low-level)
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::backend::stt::{SttBackend, TranscriptionEvent};
use vuikit::backend::tts::TtsBackend;
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;
// Create backend instances
let vad = MyVadBackend::new();
let stt = MySttBackend::new();
// Process audio stream with VAD (consumes the backend)
let audio_stream = get_microphone_stream();
let _vad_events = vad.process_stream(audio_stream);Β§Channel-based Continuous Processing (High-level)
Components wrap backends to provide channel-based continuous processing:
use vuikit::components::vad::VadComponent;
use vuikit::components::stt::SttComponent;
use vuikit::components::tts::TtsComponent;
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::backend::stt::{SttBackend, TranscriptionEvent};
use vuikit::backend::tts::TtsBackend;
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;
// Components wrap your chosen backends
let (vad_component, mut vad_channels) = VadComponent::new(MyVadBackend::new());
let (stt_component, stt_channels) = SttComponent::new(MySttBackend::new());
let (tts_component, tts_channels) = TtsComponent::new(MyTtsBackend::new());
// Start wrapper components in background tasks
tokio::spawn(async move { vad_component.run().await });
tokio::spawn(async move { stt_component.run().await });
tokio::spawn(async move { tts_component.run().await });
// Now feed data through channels continuously
for _ in 0..3 {
// Send audio when available
let audio_chunk = get_audio_chunk();
let _ = vad_channels.audio_tx.send(audio_chunk);
// Process events as they arrive
if let Ok(_vad_event) = vad_channels.event_rx.try_recv() {
// Handle voice activity...
}
}
// Components keep running until explicitly stoppedΒ§Architecture
ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Your App β β Components β β Backends β
ββββββββββββββββββββ€ ββββββββββββββββββββ€ βββββββββββββββββββ€
β β’ Send audio βββββΆβ β’ Channel mgmt βββββΆβ β’ VAD algorithm β
β β’ Receive events ββββββ β’ Lifecycle ββββββ β’ STT model β
β β’ Business logic β β β’ Error handling β β β’ TTS engine β
ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββModulesΒ§
- backend
- Backend traits for VUIKit components.
- components
- High-level components that wrap backends with continuous channel-based processing.
- core
- Core types and utilities for VUIKit.