car-voice

Voice I/O capability for Common Agent Runtime.

What it does

Channel-neutral microphone capture, voice activity detection, speech-to-text, text-to-speech, and audio playback. Any CAR-based agent or channel (CLI, GUI, IDE plug-in) can consume this crate without pulling in a UI shell.

Module map

Module	Purpose
`config`	`VoiceConfig` — provider selection, VAD tuning, mode
`error`	`VoiceError`
`events`	`VoiceEvent` enum — `SpeechStart` / `SpeechEnd` / `Transcript` / `BargeIn`
`stt`	`SttProvider` trait
`tts`	`Speaker` trait + raw playback helper
`provider`	Factories that build STT/TTS from config
`elevenlabs_stt` / `elevenlabs_tts`	ElevenLabs cloud providers
`whisper_cpp_stt`	In-process Whisper STT via whisper.cpp (Metal on Apple Silicon)
`local_tts`	Local OpenAI-compatible TTS (MLX-Whisper, mlx-audio Kokoro/Qwen3-TTS)
`listener`	`Listener` trait + cross-platform `CpalListener`
`voice_processing_listener`	macOS `VoiceProcessingIO` listener — hardware AEC, AGC, barge-in
`voice_audio_mixer`	Software mixer feeding VPIO bus 0 reference signal so AEC has something to subtract
`vad`	Energy-based VAD with adaptive noise floor + runtime threshold boost
`enrollment`	Speaker voiceprint enrollment + per-segment role classification
`narration`	TARS-style commentary helpers, pure functions

Where it fits

Foundation for car-meeting (multi-source meeting capture) and the WebSocket voice.* methods. Speech runtime install / doctor / smoke commands live in car speech (see car-cli).

car-voice 0.13.0

car-voice

What it does

Module map

Where it fits