stft-rs 0.5.1

Simple, streaming-friendly, no_std compliant STFT implementation with mel spectrogram support
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
# Mel Spectrogram Module

The `mel` module provides mel-scale frequency analysis, commonly used in speech recognition and audio processing. It converts linear frequency STFT bins into perceptually-motivated mel-scale bins.

## Quick Start

```rust
use stft_rs::prelude::*;

// Create STFT config
let stft_config = StftConfigF32::default_4096();
let stft = BatchStftF32::new(stft_config);

// Create mel spectrogram processor with defaults (80 mel bins, Slaney scale)
let mel_config = MelConfigF32::default();
let mel_proc = BatchMelSpectrogramF32::new(44100.0, 4096, &mel_config);

// Process audio to mel spectrogram
let signal: Vec<f32> = vec![0.0; 44100]; // 1 second @ 44.1kHz
let spectrum = stft.process(&signal);
let mel_spec = mel_proc.process(&spectrum);

// Convert to log-mel (dB scale)
let log_mel = mel_spec.to_db(None, None);

// Add delta features for speech recognition
let with_deltas = log_mel.with_deltas(Some(2));
// Result: 240 features per frame (80 mel + 80 delta + 80 delta-delta)
```

## Features

- **Mel Scale Variants**: HTK and Slaney (librosa-compatible, default)
- **80 Mel Bins Default**: Standard for speech/Whisper models
- **Log-Mel Conversion**: Convert to dB scale with configurable range
- **Delta Features**: First and second derivatives for speech recognition
- **Power/Magnitude**: Choose between power or magnitude spectrum
- **Batch & Streaming**: Process full spectrograms or individual frames
- **Multi-Channel**: Works with existing multi-channel STFT support
- **Zero-Copy Options**: Efficient processing with pre-allocated buffers

## Configuration

### MelConfig

```rust
use stft_rs::prelude::*;

// Default: 80 mels, Slaney scale, power spectrum
let config = MelConfigF32::default();

// Custom configuration
let config = MelConfigF32 {
    n_mels: 128,                      // Number of mel bands
    fmin: 0.0,                        // Minimum frequency (Hz)
    fmax: Some(8000.0),              // Maximum frequency (Hz), None = sr/2
    mel_scale: MelScale::Slaney,     // HTK or Slaney
    norm: MelNorm::Slaney,           // None or Slaney (area normalization)
    use_power: true,                 // true = power, false = magnitude
};
```

### Mel Scales

#### Slaney (Default)
- Linear below 1kHz, logarithmic above
- Compatible with librosa
- Better perceptual spacing

```rust
let config = MelConfigF32 {
    mel_scale: MelScale::Slaney,
    ..Default::default()
};
```

#### HTK
- Logarithmic throughout: `2595 * log10(1 + hz/700)`
- Used in HTK/Kaldi speech tools

```rust
let config = MelConfigF32 {
    mel_scale: MelScale::Htk,
    ..Default::default()
};
```

## Batch Processing

Process entire audio files at once:

```rust
use stft_rs::prelude::*;

let stft_config = StftConfigF32::default_4096();
let stft = BatchStftF32::new(stft_config);

let mel_config = MelConfigF32 {
    n_mels: 80,
    fmin: 0.0,
    fmax: Some(8000.0),
    mel_scale: MelScale::Slaney,
    norm: MelNorm::Slaney,
    use_power: true,
};

let mel_proc = BatchMelSpectrogramF32::new(44100.0, 4096, &mel_config);

// Process audio
let signal: Vec<f32> = vec![0.0; 44100];
let spectrum = stft.process(&signal);
let mel_spec = mel_proc.process(&spectrum);

// Or directly to dB scale
let log_mel = mel_proc.process_db(&spectrum, None, None);

println!("Mel spectrogram: {} frames x {} bins",
         mel_spec.num_frames, mel_spec.n_mels);
```

## Streaming Processing

Process audio frames in real-time:

```rust
use stft_rs::prelude::*;

let stft_config = StftConfigF32::default_4096();
let mut stft = StreamingStftF32::new(stft_config);

let mel_config = MelConfigF32::default();
let mel_proc = StreamingMelSpectrogramF32::new(44100.0, 4096, &mel_config);

// Process chunks
let chunk: Vec<f32> = vec![0.0; 8192];
let frames = stft.push_samples(&chunk);

// Convert each frame to mel
for frame in frames {
    let mel_frame = mel_proc.process_frame(&frame);
    // mel_frame: Vec<f32> of length 80
    println!("Mel features: {:?}", &mel_frame[..5]); // First 5 bins
}
```

### Zero-Copy Streaming

For real-time applications with minimal allocations:

```rust
let mut mel_buffer = vec![0.0f32; 80]; // Reuse this buffer

for frame in frames {
    let n = mel_proc.process_frame_into(&frame, &mut mel_buffer);
    // Process mel_buffer[..n] here
}
```

## MelSpectrum Operations

### Log-Mel (dB Scale)

Convert to logarithmic scale for better dynamic range:

```rust
let mel_spec = mel_proc.process(&spectrum);

// With defaults: amin=1e-10, top_db=80
let log_mel = mel_spec.to_db(None, None);

// Custom parameters
let log_mel = mel_spec.to_db(
    Some(1e-8),   // Minimum threshold to avoid log(0)
    Some(100.0)   // Maximum dB range
);
```

### Delta Features

Compute time derivatives for speech recognition:

```rust
let mel_spec = mel_proc.process(&spectrum);

// Delta (first derivative)
let delta = mel_spec.delta(Some(2));  // width=2 frames on each side

// Delta-delta (second derivative)
let delta_delta = mel_spec.delta_delta(Some(2));

// Concatenated features: [mel, delta, delta-delta]
let features = mel_spec.with_deltas(Some(2));
// Result: 240 features (80*3) per frame
```

**Delta Formula:**
```
delta[t] = Σ(n * (x[t+n] - x[t-n])) / (2 * Σ(n²))
where n ranges from 1 to width
```

### Accessing Data

```rust
let mel_spec = mel_proc.process(&spectrum);

// Access individual values
let value = mel_spec.get(frame_idx, mel_bin);
mel_spec.set(frame_idx, mel_bin, new_value);

// Access full frames
let frame = mel_spec.frame(frame_idx);  // &[f32]
let frame_mut = mel_spec.frame_mut(frame_idx);  // &mut [f32]

// Iterate over data
for frame_idx in 0..mel_spec.num_frames {
    let frame = mel_spec.frame(frame_idx);
    // Process frame...
}
```

### Custom Processing

```rust
// Apply function to all values
mel_spec.apply(|frame, mel_bin, value| {
    // Return modified value
    value * gain_factor
});
```

## Multi-Channel Audio

Process multi-channel audio by processing each channel separately:

```rust
use stft_rs::prelude::*;

let stft_config = StftConfigF32::default_4096();
let stft = BatchStftF32::new(stft_config);
let mel_config = MelConfigF32::default();
let mel_proc = BatchMelSpectrogramF32::new(44100.0, 4096, &mel_config);

// Process stereo
let left = vec![0.0f32; 44100];
let right = vec![0.0f32; 44100];
let channels = vec![left, right];

let spectra = stft.process_multichannel(&channels);

// Convert each channel to mel
let mel_spectra: Vec<MelSpectrumF32> = spectra
    .iter()
    .map(|spec| mel_proc.process(spec))
    .collect();

println!("Left channel: {} frames x {} mels",
         mel_spectra[0].num_frames, mel_spectra[0].n_mels);
println!("Right channel: {} frames x {} mels",
         mel_spectra[1].num_frames, mel_spectra[1].n_mels);
```

## Common Use Cases

### Speech Recognition (Whisper-style)

```rust
// Whisper uses 80 mel bins, 0-8kHz range, log-mel
let mel_config = MelConfigF32 {
    n_mels: 80,
    fmin: 0.0,
    fmax: Some(8000.0),
    mel_scale: MelScale::Slaney,
    norm: MelNorm::Slaney,
    use_power: true,
};

let mel_proc = BatchMelSpectrogramF32::new(16000.0, 400, &mel_config);
let mel_spec = mel_proc.process(&spectrum);
let log_mel = mel_spec.to_db(Some(1e-10), Some(80.0));

// For Whisper, you'd typically pad/trim to 3000 frames (30 seconds @ 50fps)
```

### Music Information Retrieval

```rust
// More mel bins, full frequency range
let mel_config = MelConfigF32 {
    n_mels: 128,
    fmin: 0.0,
    fmax: None,  // Use full Nyquist range
    mel_scale: MelScale::Slaney,
    norm: MelNorm::Slaney,
    use_power: true,
};

let mel_proc = BatchMelSpectrogramF32::new(44100.0, 4096, &mel_config);
let mel_spec = mel_proc.process(&spectrum);
let log_mel = mel_spec.to_db(None, None);
```

### Speaker Recognition

```rust
// Delta features for speaker characteristics
let mel_config = MelConfigF32::default();
let mel_proc = BatchMelSpectrogramF32::new(16000.0, 512, &mel_config);

let mel_spec = mel_proc.process(&spectrum);
let log_mel = mel_spec.to_db(None, None);

// Add deltas for speaker-specific dynamics
let features = log_mel.with_deltas(Some(2));
// Now: 240 features per frame for speaker modeling
```

## Type Aliases

For cleaner code, use type aliases:

```rust
use stft_rs::prelude::*;

// F32 aliases
let config = MelConfigF32::default();
let mel_proc = BatchMelSpectrogramF32::new(44100.0, 4096, &config);
let mel_spec: MelSpectrumF32 = mel_proc.process(&spectrum);

// F64 aliases
let config = MelConfigF64::default();
let mel_proc = BatchMelSpectrogramF64::new(44100.0, 4096, &config);
let mel_spec: MelSpectrumF64 = mel_proc.process(&spectrum);

// Available aliases:
// - MelConfigF32, MelConfigF64
// - MelFilterbankF32, MelFilterbankF64
// - MelSpectrumF32, MelSpectrumF64
// - BatchMelSpectrogramF32, BatchMelSpectrogramF64
// - StreamingMelSpectrogramF32, StreamingMelSpectrogramF64
```

## Performance Tips

1. **Reuse Buffers**: Use `process_frame_into()` in streaming mode to avoid allocations
2. **Batch Processing**: Process full files at once for better throughput
3. **Power vs Magnitude**: Power spectrum (default) is faster than magnitude
4. **Mel Bin Count**: Fewer bins = faster processing (but less frequency resolution)
5. **Rayon Parallelism**: Multi-channel processing is parallelized by default

## Implementation Details

### Mel Filterbank

- Triangular filters in mel-frequency domain
- Sparse matrix representation for efficiency
- Configurable normalization (Slaney area normalization recommended)
- Filters are precomputed at creation time

### Frequency Mappings

**Slaney Scale:**
- Linear: `mel = (hz - 0) / (200/3)` for hz < 1000
- Log: `mel = 15.0 + 27.0 * ln(hz/1000) / ln(6.4)` for hz >= 1000

**HTK Scale:**
- `mel = 2595 * log10(1 + hz/700)`

### Memory Layout

`MelSpectrum` stores data in row-major order:
```
[frame0_mel0, frame0_mel1, ..., frame0_mel79,
 frame1_mel0, frame1_mel1, ..., frame1_mel79,
 ...]
```

This layout is cache-efficient for frame-by-frame processing.

## Comparison with librosa

The default settings match librosa's mel spectrogram:

```python
# librosa (Python)
import librosa
mel_spec = librosa.feature.melspectrogram(
    y=signal, sr=44100, n_fft=4096, hop_length=1024,
    n_mels=80, fmin=0.0, fmax=8000.0,
    htk=False, norm='slaney', power=2.0
)
log_mel = librosa.power_to_db(mel_spec, ref=np.max, top_db=80)
```

```rust
// stft-rs (Rust)
let mel_config = MelConfigF32 {
    n_mels: 80,
    fmin: 0.0,
    fmax: Some(8000.0),
    mel_scale: MelScale::Slaney,  // htk=False
    norm: MelNorm::Slaney,         // norm='slaney'
    use_power: true,               // power=2.0
};

let mel_proc = BatchMelSpectrogramF32::new(44100.0, 4096, &mel_config);
let mel_spec = mel_proc.process(&spectrum);
let log_mel = mel_spec.to_db(None, Some(80.0));
```

## License

[MIT]