car-voice 0.9.0

Voice I/O capability for CAR — mic capture, VAD, listener/speaker traits
Documentation

car-voice

Voice I/O capability for Common Agent Runtime.

What it does

Channel-neutral microphone capture, voice activity detection, speech-to-text, text-to-speech, and audio playback. Any CAR-based agent or channel (CLI, GUI, IDE plug-in) can consume this crate without pulling in a UI shell.

Module map

Module Purpose
config VoiceConfig — provider selection, VAD tuning, mode
error VoiceError
events VoiceEvent enum — SpeechStart / SpeechEnd / Transcript / BargeIn
stt SttProvider trait
tts Speaker trait + raw playback helper
provider Factories that build STT/TTS from config
elevenlabs_stt / elevenlabs_tts ElevenLabs cloud providers
whisper_cpp_stt In-process Whisper STT via whisper.cpp (Metal on Apple Silicon)
local_tts Local OpenAI-compatible TTS (MLX-Whisper, mlx-audio Kokoro/Qwen3-TTS)
listener Listener trait + cross-platform CpalListener
voice_processing_listener macOS VoiceProcessingIO listener — hardware AEC, AGC, barge-in
voice_audio_mixer Software mixer feeding VPIO bus 0 reference signal so AEC has something to subtract
vad Energy-based VAD with adaptive noise floor + runtime threshold boost
enrollment Speaker voiceprint enrollment + per-segment role classification
narration TARS-style commentary helpers, pure functions

Where it fits

Foundation for car-meeting (multi-source meeting capture) and the WebSocket voice.* methods. Speech runtime install / doctor / smoke commands live in car speech (see car-cli).