brainwires-audio
Audio I/O, speech-to-text, and text-to-speech for the Brainwires Agent Framework.
Overview
brainwires-audio provides a unified interface for audio capture, playback, speech recognition, and speech synthesis. It abstracts over hardware backends (CPAL), cloud APIs (OpenAI Whisper / TTS), and local inference (whisper-rs) behind common traits, so agents can speak and listen without caring about the underlying implementation.
Design principles:
- Trait-driven —
AudioCapture,AudioPlayback,SpeechToText, andTextToSpeechare all trait objects for swappable backends - Hardware-agnostic — CPAL handles cross-platform audio device access (Linux, macOS, Windows)
- Cloud + local — OpenAI APIs for zero-setup, local Whisper for offline/private deployments
- WAV + FLAC — built-in WAV encode/decode via
hound, lossless FLAC encode/decode viaflacenc/claxon(all pure Rust) - Ring-buffered —
AudioRingBufferfor lock-free streaming between capture and processing
┌──────────────────────────────────────────────────────────┐
│ brainwires-audio │
│ │
│ ┌────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ Hardware │ │ Capture │ │ Playback │ │
│ │ CpalCapture│────>│ (trait) │ │ (trait) │ │
│ │ CpalPlay │ │ AudioBuffer│ │ AudioBuffer │ │
│ └────────────┘ └──────┬─────┘ └──────┬──────┘ │
│ │ │ │
│ v v │
│ ┌────────────┐ ┌────────────┐ ┌─────────────┐ │
│ │ WAV/FLAC │ │ STT │ │ TTS │ │
│ │ encode/ │ │ (trait) │ │ (trait) │ │
│ │ decode │ │ Transcript │ │ Voice │ │
│ └────────────┘ └──────┬─────┘ └──────┬──────┘ │
│ │ │ │
│ ┌────────┴────────┐ ┌──────┴──────┐ │
│ │ API Backends │ │ API Backend │ │
│ │ OpenAiStt │ │ OpenAiTts │ │
│ │ WhisperStt │ │ │ │
│ └─────────────────┘ └─────────────┘ │
└──────────────────────────────────────────────────────────┘
Flow: Hardware -> Capture/Playback -> STT/TTS -> API/Local backends
Quick Start
Add to your Cargo.toml:
[]
= "0.1"
Capture audio, save as WAV, and transcribe:
use ;
async
Features
| Feature | Default | Description |
|---|---|---|
native |
Yes | Hardware audio via CPAL, cloud APIs via reqwest, async streaming. Includes flac. |
flac |
Yes (via native) |
FLAC lossless encode (flacenc) and decode (claxon) — pure Rust, no system deps |
local-stt |
No | Local speech-to-text via whisper-rs — requires Whisper GGML model weights on disk |
# Lightweight — no hardware or network deps (WAV encode/decode + traits only)
[]
= { = "0.3", = false }
# Default + local Whisper STT
[]
= { = "0.3", = ["local-stt"] }
# FLAC only, no hardware
[]
= { = "0.3", = false, = ["flac"] }
Architecture
AudioBuffer
In-memory audio data with format metadata.
| Field | Type | Description |
|---|---|---|
data |
Vec<u8> |
Raw PCM samples (little-endian bytes) |
config |
AudioConfig |
Sample rate, channel count, sample format |
Methods: from_pcm(data, config), duration_secs(), num_frames(), is_empty()
AudioConfig
| Field | Type | Default | Description |
|---|---|---|---|
sample_rate |
u32 |
16000 |
Samples per second |
channels |
u16 |
1 |
Mono (1) or stereo (2) |
sample_format |
SampleFormat |
I16 |
I16 or F32 |
Presets: AudioConfig::speech() (16 kHz mono i16), AudioConfig::cd_quality() (44.1 kHz stereo i16), AudioConfig::high_quality() (48 kHz stereo f32)
AudioCapture (trait)
Implementations: CpalCapture (native feature)
AudioPlayback (trait)
Implementations: CpalPlayback (native feature)
SpeechToText (trait)
Implementations: OpenAiStt (native), WhisperStt (local-stt)
TextToSpeech (trait)
Implementations: OpenAiTts (native)
Transcript
| Field | Type | Description |
|---|---|---|
text |
String |
Full transcription text |
segments |
Vec<TranscriptSegment> |
Timed segments with start/end in seconds |
language |
Option<String> |
Detected language code |
duration_secs |
Option<f64> |
Audio duration |
AudioDevice
| Field | Type | Description |
|---|---|---|
id |
String |
Platform-specific device identifier |
name |
String |
Human-readable display name |
direction |
DeviceDirection |
Input or Output |
is_default |
bool |
Whether this is the system default device |
AudioRingBuffer
Lock-free ring buffer for streaming audio between producer (capture) and consumer (processing) threads. Fixed capacity, overwrites oldest samples when full.
Methods: new(config, duration_secs), push(bytes), read_all(), duration_secs(), clear(), is_full()
Usage Examples
Record and Save as WAV or FLAC
use ;
let capture = new;
let config = speech;
let buffer = capture.record.await?;
// Save as WAV
let wav = encode_wav?;
write?;
// Save as FLAC (smaller, lossless)
Load and Play Audio (WAV or FLAC)
use ;
// WAV
let buffer = decode_wav?;
// FLAC
let buffer = decode_flac?;
let playback = new;
playback.play.await?;
Text-to-Speech
use ;
let tts = new;
let audio = tts.synthesize.await?;
let playback = new;
playback.play.await?;
Listing Audio Devices
use ;
let capture = new;
for dev in capture.list_devices?
let playback = new;
for dev in playback.list_devices?
Local Whisper STT
use ;
Examples
Run the included examples:
# Record 5 seconds to WAV (default)
# Record 10 seconds to FLAC
# Play a WAV or FLAC file
Integration with Brainwires
Use via the brainwires facade crate:
[]
= { = "0.3", = ["audio"] }
Or depend on brainwires-audio directly for standalone audio support without the rest of the framework.
License
Licensed under the MIT License. See LICENSE for details.