Streaming ASR trait surface for voice pipelines, intended to wrap one or more speech-to-text backends behind a common Rust API. Same pattern as wavekat-vad and wavekat-turn.
[!WARNING] Scaffold release. This crate ships only the trait shape and a scripted-event
mockbackend so downstream consumers can wire integration tests against the contract. No real ASR backends are bundled yet — the trait may iterate before the first one lands. Pin to an exact patch version.
What's included
| Item | Feature flag |
|---|---|
StreamingAsr trait, TranscriptEvent, Channel, AsrError |
always |
MockAsr — scripted partials → final, paired with an mpsc::Receiver |
mock |
Quick start
use ;
use MockAsr;
let = new;
let samples = vec!;
let frame = new;
asr.push_audio.unwrap;
asr.finish.unwrap;
for event in rx.try_iter
Architecture
The crate exposes one trait — StreamingAsr — and one event enum —
TranscriptEvent. The trait keeps the surface that consumers see as
small as possible; backends will own their own resampling, network
state, and tokenizer.
AudioFrame ──▶ push_audio(frame, channel) ──▶ ┌───────────┐
│ Backend │
end of call ─▶ finish() ──▶ │ │
│ │
TranscriptEvent ◀─│ │
on Receiver └───────────┘
Why a sync push + receiver pair, rather than async fn? The daemon that
will consume this (wavekat-voice) already runs an event loop and fans
events out over SSE; matching that shape avoids forcing a tokio runtime
through the trait. Backends that need their own runtime will spawn one
internally.
License
Apache-2.0. See LICENSE.