# car-voice
Voice I/O capability for [Common Agent Runtime](https://github.com/Parslee-ai/car).
## What it does
Channel-neutral microphone capture, voice activity detection, speech-to-text, text-to-speech, and audio playback. Any CAR-based agent or channel (CLI, GUI, IDE plug-in) can consume this crate without pulling in a UI shell.
## Module map
| `config` | `VoiceConfig` — provider selection, VAD tuning, mode |
| `error` | `VoiceError` |
| `events` | `VoiceEvent` enum — `SpeechStart` / `SpeechEnd` / `Transcript` / `BargeIn` |
| `stt` | `SttProvider` trait |
| `tts` | `Speaker` trait + raw playback helper |
| `provider` | Factories that build STT/TTS from config |
| `elevenlabs_stt` / `elevenlabs_tts` | ElevenLabs cloud providers |
| `whisper_cpp_stt` | In-process Whisper STT via whisper.cpp (Metal on Apple Silicon) |
| `local_tts` | Local OpenAI-compatible TTS (MLX-Whisper, mlx-audio Kokoro/Qwen3-TTS) |
| `listener` | `Listener` trait + cross-platform `CpalListener` |
| `voice_processing_listener` | macOS `VoiceProcessingIO` listener — hardware AEC, AGC, barge-in |
| `voice_audio_mixer` | Software mixer feeding VPIO bus 0 reference signal so AEC has something to subtract |
| `vad` | Energy-based VAD with adaptive noise floor + runtime threshold boost |
| `enrollment` | Speaker voiceprint enrollment + per-segment role classification |
| `narration` | TARS-style commentary helpers, pure functions |
## Where it fits
Foundation for `car-meeting` (multi-source meeting capture) and the WebSocket `voice.*` methods. Speech runtime install / doctor / smoke commands live in `car speech` (see `car-cli`).