Crate kalosm_sound

Source
Expand description

§Kalosm Sound

Kalosm Sound is a collection of audio models and utilities for the Kalosm framework. It supports several voice activity detection models, and provides utilities for transcribing audio into text.

§Sound Streams

Models in kalosm sound work with any AsyncSource. You can use MicInput::stream to stream audio from the microphone, or any synchronous audio source that implements rodio::Source like a mp3 or wav file.

You can transform the audio streams with:

§Voice Activity Detection

VAD models are used to detect when a speaker is speaking in a given audio stream. The simplest way to use a VAD model is to create an audio stream and call VoiceActivityDetectorExt::voice_activity_stream to stream audio chunks that are actively being spoken:

use kalosm::sound::*;
#[tokio::main]
async fn main() {
    // Get the default microphone input
    let mic = MicInput::default();
    // Stream the audio from the microphone
    let stream = mic.stream();
    // Detect voice activity in the audio stream
    let mut vad = stream.voice_activity_stream();
    while let Some(input) = vad.next().await {
        println!("Probability: {}", input.probability);
    }
}

Kalosm also provides VoiceActivityStreamExt::rechunk_voice_activity to collect chunks of consecutive audio samples with a high vad probability. This can be useful for applications like speech recognition where context between consecutive audio samples is important.

use kalosm::sound::*;
use rodio::Source;
#[tokio::main]
async fn main() {
    // Get the default microphone input
    let mic = MicInput::default();
    // Stream the audio from the microphone
    let stream = mic.stream();
    // Chunk the audio into chunks of speech
    let vad = stream.voice_activity_stream();
    let mut audio_chunks = vad.rechunk_voice_activity();
    // Print the chunks as they are streamed in
    while let Some(input) = audio_chunks.next().await {
        println!("New voice activity chunk with duration {:?}", input.total_duration());
    }
}

§Transcription

You can use the Whisper model to transcribe audio into text. Kalosm can transcribe any AsyncSource into a transcription stream with the AsyncSourceTranscribeExt::transcribe method:

use kalosm::sound::*;
#[tokio::main]
async fn main() {
    // Get the default microphone input
    let mic = MicInput::default();
    // Stream the audio from the microphone
    let stream = mic.stream();
    // Transcribe the audio into text with the default Whisper model
    let mut transcribe = stream.transcribe(Whisper::new().await.unwrap());
    // Print the text as it is streamed in
    transcribe.to_std_out().await.unwrap();
}

Re-exports§

pub use dasp;
pub use rodio;

Structs§

ChunkedTranscriptionTask
A chunked audio transcription task which can be streamed from a Whisper model.
DenoisedStream
A stream of SamplesBuffers with voice activity detection information
MicInput
A microphone input.
MicStream
A stream of audio data from the microphone.
ParseWhisperLanguageError
Error that reports the unsupported value
ParseWhisperSourceError
Error that reports the unsupported value
ResampledAsyncSource
A resampled async audio source
Segment
A transcribed segment of audio.
TokenChunkRef
A reference to a utf8 token chunk in a segment.
TranscriptionTask
A transcription task which can be streamed from a Whisper model.
VoiceActivityDetectorOutput
The output of a crate::VoiceActivityDetectorStream
VoiceActivityDetectorStream
A stream of SamplesBuffers with voice activity detection information
VoiceActivityFilterStream
A stream of audio chunks that have a voice activity probability above a given threshold
VoiceActivityRechunkerStream
A stream of audio chunks with a voice activity probability rolling average above a given threshold
Whisper
A quantized whisper audio transcription model.
WhisperBuilder
A builder with configuration for a Whisper model.

Enums§

FileSource
A source for a file, either from Hugging Face or a local path
ModelLoadingProgress
The progress starting a model
WhisperLanguage
A language whisper can use
WhisperSource
The source whisper model to use.

Traits§

AsyncSource
A streaming audio source for single channel audio. This trait is implemented for all types that implement rodio::Source automatically.
AsyncSourceTranscribeExt
An extension trait for AsyncSource that integrates with crate::Whisper.
DenoisedExt
An extension trait for audio streams for denoising. Based on the nnnoiseless crate.
TranscribeChunkedAudioStreamExt
An extension trait to transcribe pre-chunked audio streams
VoiceActivityDetectorExt
An extension trait for audio streams that adds a voice activity detection information. Based on the voice_activity_detector crate.
VoiceActivityStreamExt
An extension trait for audio streams with voice activity detection information