1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
//! Audio processing and acoustic features for speech recognition.
//!
//! This module provides audio feature extraction capabilities for:
//!
//! - **Mel Filterbank Features**: Perceptually-motivated frequency representation
//! - **MFCC**: Mel-frequency cepstral coefficients (classic ASR features)
//! - **Spectrogram**: Raw power/magnitude spectrum
//! - **Log-Mel**: Log-compressed mel spectrum (neural model input)
//!
//! # Overview
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────────────────────┐
//! │ Acoustic Processing │
//! ├─────────────────────────────────────────────────────────────────────────────┤
//! │ │
//! │ ┌──────────────────────────────────────────────────────────────────────┐ │
//! │ │ Audio Feature Extraction │ │
//! │ │ │ │
//! │ │ Raw Audio ─► Pre-emphasis ─► Framing ─► Windowing ─► FFT │ │
//! │ │ ─► Power Spectrum ─► Mel Filterbank ─► Log ─► (DCT) │ │
//! │ │ │ │
//! │ │ Features: │ │
//! │ │ • FeatureExtractor: Batch extraction │ │
//! │ │ • StreamingFeatureExtractor: Real-time extraction │ │
//! │ │ • MelFilterbank: Triangular filter bank │ │
//! │ │ │ │
//! │ └──────────────────────────────────────────────────────────────────────┘ │
//! │ │
//! │ Integration with lling-llang: │
//! │ • Features feed into AcousticModel implementations │
//! │ • CTC decoder consumes frame posteriors │
//! │ • ASR cascade (H∘C∘L∘G) uses emission probabilities │
//! │ │
//! └─────────────────────────────────────────────────────────────────────────────┘
//! ```
//!
//! # Example
//!
//! ```ignore
//! use libgrammstein::acoustic::{FeatureExtractor, FeatureConfig};
//!
//! // Create feature extractor for 16kHz audio
//! let config = FeatureConfig::default();
//! let extractor = FeatureExtractor::new(config);
//!
//! // Load audio (mono, 16kHz)
//! let audio: Vec<f32> = load_audio("speech.wav");
//!
//! // Extract 40-dim mel filterbank features
//! let filterbank = extractor.extract_filterbank(&audio);
//! println!("Extracted {} frames of {} dimensions", filterbank.len(), filterbank[0].len());
//!
//! // Extract 13-dim MFCC
//! let mfcc = extractor.extract_mfcc(&audio);
//! println!("MFCC: {} frames", mfcc.len());
//! ```
//!
//! # Streaming Example
//!
//! ```ignore
//! use libgrammstein::acoustic::{StreamingFeatureExtractor, FeatureConfig};
//!
//! let mut streaming = StreamingFeatureExtractor::new(FeatureConfig::default());
//!
//! // Process audio in chunks (e.g., from microphone)
//! loop {
//! let chunk = read_audio_chunk();
//! streaming.add_samples(&chunk);
//!
//! // Extract available frames
//! let features = streaming.extract_filterbank();
//! if !features.is_empty() {
//! process_features(&features);
//! }
//! }
//!
//! // Flush remaining audio at end of stream
//! let final_features = streaming.flush_filterbank();
//! ```
//!
//! # Feature Types
//!
//! | Feature Type | Dimensions | Use Case |
//! |--------------|------------|----------|
//! | **Filterbank** | 40-80 | Neural acoustic models (Conformer, Whisper) |
//! | **MFCC** | 13-39 | GMM-HMM systems, some neural models |
//! | **Log-Mel** | 40-80 | Transformer models, streaming ASR |
//! | **Spectrogram** | FFT/2+1 | Visualization, debugging |
//!
//! # Configuration
//!
//! Common configurations:
//!
//! - `FeatureConfig::default()`: 16kHz wideband speech
//! - `FeatureConfig::telephony()`: 8kHz narrowband (phone)
//! - `FeatureConfig::music()`: 44.1kHz high-fidelity
//!
//! # References
//!
//! - Davis & Mermelstein (1980) - MFCC
//! - Stevens et al. (1937) - Mel scale
//! - Povey et al. (2011) - Kaldi speech recognition toolkit
pub use ;
pub use ;