axonml-audio
Overview
axonml-audio provides audio signal-processing and dataset utilities for the AxonML framework: waveform transforms (spectrograms, MFCC, resampling, time stretch, pitch shift, augmentation), synthetic datasets, and an AudioSeq2SeqDataset for noise-reduction-style source/target pairs. FFT-based transforms use rustfft (O(n log n)).
Features
- Resampling —
Resamplewith linear interpolation between arbitrary sample rates. - Mel spectrogram —
MelSpectrogram::new(sample_rate)(defaults: n_fft=2048, hop=512, n_mels=128) orwith_params; rustfft-backed. - MFCC —
MFCC::new(sample_rate, n_mfcc)for cepstral features. - Time stretching —
TimeStretch::new(rate)changes duration; preserves shape whenrate = 1.0. - Pitch shifting —
PitchShift::new(semitones). - Noise augmentation —
AddNoise::new(snr_db)with a configurable signal-to-noise ratio. - Audio normalization —
NormalizeAudio::new()peak-normalizes to max-|amplitude| = 1. - Silence trimming —
TrimSilence::new(threshold_db)orTrimSilence::default_threshold(). - Classification datasets — generic
AudioClassificationDatasetplus syntheticSyntheticCommandDataset(small/medium/large),SyntheticMusicDataset(small/medium),SyntheticSpeakerDataset(small/medium). Labels are class-index tensors of shape[1](CrossEntropyLoss-compatible). - Sequence-to-sequence datasets —
AudioSeq2SeqDatasetfor source/target waveform pairs, withnoise_reduction_taskconstructor.
Modules
| Module | Description |
|---|---|
transforms |
Resample, MelSpectrogram, MFCC, TimeStretch, PitchShift, AddNoise, NormalizeAudio, TrimSilence (all implement axonml_data::Transform) |
datasets |
AudioClassificationDataset, SyntheticCommandDataset, SyntheticMusicDataset, SyntheticSpeakerDataset, AudioSeq2SeqDataset |
Usage
Add to your Cargo.toml:
[]
= "0.6.1"
Loading Audio Datasets
use *;
// Synthetic command dataset (e.g., "yes"/"no"/"stop")
let dataset = small; // 100 samples, 10 classes, 16 kHz, 0.5 s
let dataset = medium; // 1000 samples, 10 classes
let dataset = large; // 10000 samples, 35 classes
// Music genre / speaker presets
let music = small; // multiple genres
let speakers = small;
let = dataset.get.unwrap;
// waveform: [n_samples] float; label: [1] class index
Mel Spectrogram
use ;
use Transform;
let mel = new; // defaults: n_fft=2048, hop=512, n_mels=128
let mel = with_params; // custom
let spectrogram = mel.apply;
assert_eq!; // n_mels
MFCC Feature Extraction
use MFCC;
use Transform;
let mfcc = MFCCnew;
let coefficients = mfcc.apply;
assert_eq!;
Audio Resampling
use Resample;
use Transform;
let resample = new;
let resampled = resample.apply;
// Output length = floor(input_len * new_freq / orig_freq)
Audio Augmentation
use ;
use Transform;
let noisy = new.apply; // SNR in dB
let stretched = new.apply;
let shifted = new.apply; // semitones
Normalization and Trimming
use ;
use Transform;
let normalized = new.apply; // peak to 1.0
let trimmed = new.apply; // threshold in dB
let trimmed = default_threshold.apply;
Full Audio Processing Pipeline
use *;
use DataLoader;
let dataset = medium;
let loader = new.shuffle;
let resample = new;
let normalize = new;
let mel = with_params;
for batch in loader.iter
Sequence-to-Sequence Audio Tasks
use AudioSeq2SeqDataset;
// Noise reduction (noisy -> clean pairs)
let dataset = noise_reduction_task;
let = dataset.get.unwrap;
assert_eq!;
// Or bring your own source/target waveforms:
let ds = new;
Tests
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Last updated: 2026-04-16 (v0.6.1)