Crate vuikit

Crate vuikit 

Source
Expand description

Β§VUIKit - Voice User Interface Toolkit

A Rust library for building real-time voice user interfaces with streaming support for Voice Activity Detection (VAD), Speech-to-Text (STT), and Text-to-Speech (TTS).

Β§Features

  • πŸš€ Stream-first: All operations are async stream-based for real-time processing
  • ⚑ Low latency: Designed for minimal delay in voice interactions
  • πŸ”Œ Pluggable backends: Easy to swap different VAD/STT/TTS implementations
  • πŸ“‘ Real-time: Supports incremental results and continuous streaming
  • 🧩 Channel-based: Components provide tokio channels for continuous processing
  • 🎯 Multi-capability: Backends can provide multiple capabilities (e.g., Whisper = VAD+STT)

Β§Installation

Add VUIKit to your Cargo.toml:

[dependencies]
vuikit = "0.1.0"
futures = "0.3"
tokio = { version = "1.0", features = ["sync", "macros", "rt"] }

Β§Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Your App       β”‚    β”‚   Components     β”‚    β”‚   Backends      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Send audio     │───▢│ β€’ Channel mgmt   │───▢│ β€’ VAD algorithm β”‚
β”‚ β€’ Receive events │◀───│ β€’ Lifecycle      │◀───│ β€’ STT model     β”‚
β”‚ β€’ Business logic β”‚    β”‚ β€’ Error handling β”‚    β”‚ β€’ TTS engine    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

VUIKit provides two levels of abstraction:

  • Backends: Trait definitions for implementing VAD/STT/TTS logic
  • Components: Wrapper abstractions that provide channel-based continuous processing

Β§Quick Start

Β§Stream-based Processing (Low-level)

Direct backend usage for one-time processing:

use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;

// Implement your VAD backend
struct MyVadBackend;

impl VadBackend for MyVadBackend {
    type VadStream = Pin<Box<dyn Stream<Item = VuiResult<VadEvent>> + Send>>;

    fn process_stream<S>(self, audio_stream: S) -> Self::VadStream
    where S: Stream<Item = VuiResult<AudioChunk>> + Send + Unpin + 'static
    {
        Box::pin(audio_stream.map(|chunk| {
            match chunk {
                Ok(audio_chunk) => Ok(VadEvent::Silence { audio_chunk }),
                Err(e) => Err(e),
            }
        }))
    }
}

fn get_microphone_stream() -> impl Stream<Item = VuiResult<AudioChunk>> + Send + Unpin {
    stream::empty()
}

#[tokio::main]
async fn main() -> VuiResult<()> {
    let vad = MyVadBackend;
    let audio_stream = get_microphone_stream();
    let mut vad_events = vad.process_stream(audio_stream);

    while let Some(event) = vad_events.next().await {
        match event? {
            VadEvent::VoiceStarted { confidence, .. } => {
                println!("Voice detected! Confidence: {}", confidence);
            }
            VadEvent::VoiceEnded => {
                println!("Voice stopped");
            }
            _ => {}
        }
    }
    Ok(())
}

Β§Channel-based Continuous Processing (High-level)

Components for long-running voice applications:

use vuikit::components::vad::VadComponent;
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;

// Hidden backend implementation for the example
struct MyVadBackend;

impl VadBackend for MyVadBackend {
    type VadStream = Pin<Box<dyn Stream<Item = VuiResult<VadEvent>> + Send>>;

    fn process_stream<S>(self, audio_stream: S) -> Self::VadStream
    where S: Stream<Item = VuiResult<AudioChunk>> + Send + Unpin + 'static
    {
        Box::pin(audio_stream.map(|chunk| {
            match chunk {
                Ok(audio_chunk) => Ok(VadEvent::Silence { audio_chunk }),
                Err(e) => Err(e),
            }
        }))
    }
}

async fn capture_audio() -> AudioChunk {
    AudioChunk::new(vec![0.0; 512], 16000, 1)
}

fn start_recording() {
    println!("Started recording");
}

fn stop_recording() {
    println!("Stopped recording");
}

#[tokio::main]
async fn main() {
    // Wrap your backend in a component
    let (vad_component, mut vad_channels) = VadComponent::new(MyVadBackend);

    // Start component in background
    tokio::spawn(async move {
        vad_component.run().await.unwrap();
    });

    // Send audio continuously
    for _ in 0..3 {  // Limited loop for example
        let audio_chunk = capture_audio().await;
        vad_channels.audio_tx.send(audio_chunk).unwrap();

        // Handle events as they arrive
        while let Ok(event) = vad_channels.event_rx.try_recv() {
            match event {
                VadEvent::VoiceStarted { .. } => start_recording(),
                VadEvent::VoiceEnded => stop_recording(),
                _ => {}
            }
        }
    }
}

Β§Core Concepts

Β§Backends vs Components

  • Backends: Implement the actual VAD/STT/TTS algorithms

    • Use for one-time processing
    • Direct stream handling
    • Full control over processing pipeline
    • Consume self to avoid borrowing issues
  • Components: Wrapper abstractions around backends

    • Use for long-running applications
    • Channel-based architecture
    • Background processing tasks
    • Automatic lifecycle management

Β§Backend Traits

VUIKit provides three main backend traits:

use vuikit::backend::vad::VadBackend;
use vuikit::backend::stt::SttBackend;
use vuikit::backend::tts::TtsBackend;

// These traits are already defined in VUIKit:
// VadBackend, SttBackend, TtsBackend
// See the actual trait definitions in the API documentation

Β§Multi-capability Backends

Some backends provide multiple capabilities (e.g., OpenAI Whisper does both VAD and STT):

use vuikit::backend::vad::VadBackend;
use vuikit::backend::stt::SttBackend;

struct WhisperBackend {
    // Backend-specific fields would go here
}

// A backend can implement multiple traits:
// impl VadBackend for WhisperBackend { /* VAD implementation */ }
// impl SttBackend for WhisperBackend { /* STT implementation */ }

Β§Audio Data

All audio processing uses the AudioChunk type:

use vuikit::core::audio::AudioChunk;

let audio = AudioChunk::new(
    vec![0.1, 0.2, 0.3], // samples as f32
    16000,               // sample rate
    1                    // channels
);

Β§Error Handling

VUIKit uses a comprehensive error system:

use vuikit::core::error::{VuiError, VuiResult};

fn handle_result(result: VuiResult<()>) {
    match result {
        Ok(_) => println!("Success"),
        Err(VuiError::BackendError(msg)) => eprintln!("Backend error: {}", msg),
        Err(VuiError::StreamError(msg)) => eprintln!("Stream error: {}", msg),
        Err(VuiError::InvalidAudioFormat(msg)) => eprintln!("Audio format error: {}", msg),
        Err(VuiError::ConfigError(msg)) => eprintln!("Config error: {}", msg),
    }
}

Β§Example Backend Implementation

Here’s a simple VAD backend implementation:

use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::core::{audio::AudioChunk, error::VuiResult};
use futures::{Stream, StreamExt};
use std::pin::Pin;

pub struct SimpleVadBackend {
    threshold: f32,
}

impl SimpleVadBackend {
    pub fn new(threshold: f32) -> Self {
        Self { threshold }
    }
}

impl VadBackend for SimpleVadBackend {
    type VadStream = Pin<Box<dyn Stream<Item = VuiResult<VadEvent>> + Send>>;

    fn process_stream<S>(self, audio_stream: S) -> Self::VadStream
    where S: Stream<Item = VuiResult<AudioChunk>> + Send + Unpin + 'static
    {
        Box::pin(audio_stream.map(move |chunk_result| {
            let chunk = chunk_result?;

            // Simple energy-based VAD
            let energy: f32 = chunk.samples.iter()
                .map(|&sample| sample * sample)
                .sum::<f32>() / chunk.samples.len() as f32;

            if energy > self.threshold {
                Ok(VadEvent::VoiceStarted {
                    confidence: (energy / self.threshold).min(1.0),
                    audio_chunk: chunk
                })
            } else {
                Ok(VadEvent::Silence { audio_chunk: chunk })
            }
        }))
    }
}

Β§When to Use What

Β§Use Backends Directly When:

  • One-time processing tasks
  • Custom stream handling
  • Full control over processing pipeline
  • Building your own abstractions

Β§Use Components When:

  • Long-running voice applications (chatbots, assistants)
  • Channel-based architecture
  • Background processing tasks
  • Automatic lifecycle management

Β§Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Β§License

This project is licensed under the MIT License - see the LICENSE file for details.


Β§API Examples

Β§Stream-based Processing (Low-level)

use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::backend::stt::{SttBackend, TranscriptionEvent};
use vuikit::backend::tts::TtsBackend;
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;


// Create backend instances
let vad = MyVadBackend::new();
let stt = MySttBackend::new();

// Process audio stream with VAD (consumes the backend)
let audio_stream = get_microphone_stream();
let _vad_events = vad.process_stream(audio_stream);

Β§Channel-based Continuous Processing (High-level)

Components wrap backends to provide channel-based continuous processing:

use vuikit::components::vad::VadComponent;
use vuikit::components::stt::SttComponent;
use vuikit::components::tts::TtsComponent;
use vuikit::backend::vad::{VadBackend, VadEvent};
use vuikit::backend::stt::{SttBackend, TranscriptionEvent};
use vuikit::backend::tts::TtsBackend;
use vuikit::core::audio::AudioChunk;
use vuikit::core::error::VuiResult;
use futures::{Stream, StreamExt, stream};
use std::pin::Pin;


// Components wrap your chosen backends
let (vad_component, mut vad_channels) = VadComponent::new(MyVadBackend::new());
let (stt_component, stt_channels) = SttComponent::new(MySttBackend::new());
let (tts_component, tts_channels) = TtsComponent::new(MyTtsBackend::new());

// Start wrapper components in background tasks
tokio::spawn(async move { vad_component.run().await });
tokio::spawn(async move { stt_component.run().await });
tokio::spawn(async move { tts_component.run().await });

// Now feed data through channels continuously
for _ in 0..3 {
    // Send audio when available
    let audio_chunk = get_audio_chunk();
    let _ = vad_channels.audio_tx.send(audio_chunk);

    // Process events as they arrive
    if let Ok(_vad_event) = vad_channels.event_rx.try_recv() {
        // Handle voice activity...
    }
}
// Components keep running until explicitly stopped

Β§Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Your App       β”‚    β”‚   Components     β”‚    β”‚   Backends      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ β€’ Send audio     │───▢│ β€’ Channel mgmt   │───▢│ β€’ VAD algorithm β”‚
β”‚ β€’ Receive events │◀───│ β€’ Lifecycle      │◀───│ β€’ STT model     β”‚
β”‚ β€’ Business logic β”‚    β”‚ β€’ Error handling β”‚    β”‚ β€’ TTS engine    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

ModulesΒ§

backend
Backend traits for VUIKit components.
components
High-level components that wrap backends with continuous channel-based processing.
core
Core types and utilities for VUIKit.