axonml-audio
Overview
axonml-audio provides audio processing functionality for the AxonML framework. It includes signal processing transforms for spectrograms and feature extraction, audio augmentation techniques, and datasets for audio classification, speech recognition, and music genre tasks.
Features
- Resampling - Sample rate conversion using linear interpolation
- Mel Spectrogram - Compute mel-scaled spectrograms with configurable FFT size, hop length, and mel bins
- MFCC - Mel-frequency cepstral coefficients for speech and audio feature extraction
- Time Stretching - Speed up or slow down audio without changing pitch
- Pitch Shifting - Change pitch without altering duration
- Noise Augmentation - Add Gaussian noise with configurable SNR for data augmentation
- Audio Normalization - Peak normalization to maximum amplitude
- Silence Trimming - Remove silence from beginning and end of audio
- Synthetic Datasets - Command recognition, music genre, and speaker identification datasets
Modules
| Module | Description |
|---|---|
transforms |
Audio signal processing transforms (Resample, MelSpectrogram, MFCC, TimeStretch, PitchShift, AddNoise, NormalizeAudio, TrimSilence) |
datasets |
Audio dataset implementations (SyntheticCommandDataset, SyntheticMusicDataset, SyntheticSpeakerDataset, AudioSeq2SeqDataset) |
Usage
Add to your Cargo.toml:
[]
= "0.1.0"
Loading Audio Datasets
use *;
// Synthetic command dataset (like "yes", "no", "stop")
let dataset = small; // 100 samples, 10 classes
let dataset = medium; // 1000 samples
let dataset = large; // 10000 samples, 35 classes
// Music genre dataset
let music = small; // 5 genres
// Speaker identification dataset
let speakers = small; // 5 speakers
// Get a sample
let = dataset.get.unwrap;
println!;
println!;
Mel Spectrogram
use ;
// Default parameters
let mel = new; // 16kHz sample rate
// Custom parameters
let mel = with_params;
let spectrogram = mel.apply;
assert_eq!; // n_mels
MFCC Feature Extraction
use ;
let mfcc = MFCCnew; // 16kHz, 13 coefficients
let = dataset.get.unwrap;
let coefficients = mfcc.apply;
assert_eq!; // n_mfcc
Audio Resampling
use ;
// Resample from 22050Hz to 16000Hz
let resample = new;
let resampled = resample.apply;
// New length proportional to sample rate ratio
Audio Augmentation
use ;
// Add Gaussian noise with 20dB SNR
let add_noise = new;
let noisy = add_noise.apply;
// Time stretch (speed up 1.5x)
let stretch = new;
let stretched = stretch.apply;
// Pitch shift up 2 semitones
let shift = new;
let shifted = shift.apply;
Audio Normalization and Trimming
use ;
// Normalize to peak amplitude of 1.0
let normalize = new;
let normalized = normalize.apply;
// Trim silence below -60dB
let trim = new;
let trimmed = trim.apply;
Full Audio Processing Pipeline
use *;
use DataLoader;
// Create dataset and dataloader
let dataset = medium;
let loader = new.shuffle;
// Define transforms
let resample = new;
let normalize = new;
let mel = with_params;
// Process batches
for batch in loader.iter
Sequence-to-Sequence Audio Tasks
use AudioSeq2SeqDataset;
// Noise reduction dataset (noisy -> clean pairs)
let dataset = noise_reduction_task;
let = dataset.get.unwrap;
assert_eq!;
Tests
Run the test suite:
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.