audio_samples 1.0.12

A typed audio processing library for Rust that treats audio as a first-class, invariant-preserving object rather than an unstructured numeric buffer.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
# Plotting + Educational — Roadmap

## Architecture decision: two new crates

The advanced features in this roadmap — streaming playback with a synced playhead, WASM
recomputation, A/B difference spectrograms, filter Bode plots — pull in dependencies and
capabilities that have no place in the core `audio_samples` crate. Equally, the educational
layer depends on rich plotting, which depends on the core. The correct shape is a dependency
chain across the ecosystem:

```
audio_samples_io       audio_samples_streaming
       │                        │
       └────────────┬───────────┘
              audio_samples          (core: types, traits, algorithms)
         audio_samples_plotting      (all plot types, themes, DSP overlays)
         audio_samples_education     (educational HTML documents, WASM, playback)
```

`audio_samples_plotting` depends on `audio_samples` for `AudioSamples<T>`, `StandardSample`,
and the algorithm traits. It owns every plot type, theme, and DSP-overlay computation.

`audio_samples_education` depends on `audio_samples_plotting` for waveform/spectrogram/filter
rendering and on `audio_samples` for the explain-chain primitives. It will eventually depend on
`audio_samples_io` for inline base64 audio encoding and `audio_samples_streaming` for Web Audio
playback.

`audio_samples` drops the `plotting` and `educational` features entirely after migration. Users
who want visualization: `audio_samples_plotting`. Users who want the educational layer:
`audio_samples_education`.

---

## Phase 0 — Crate extraction

**Goal:** Establish the two new crates and migrate without breaking the existing public API.

### 0.1 — Create `audio_samples_plotting`

- New crate in the workspace: `crates/audio_samples_plotting/`
- Move `src/operations/plotting/``audio_samples_plotting/src/`
- Move the `AudioPlotting` trait from `audio_samples::operations::traits` into
  `audio_samples_plotting`
- `audio_samples` drops the `plotting` feature; existing users who depended on it migrate to
  the new crate
- Re-export `audio_samples_plotting` publicly from `audio_samples` behind an optional
  `plotting` feature as a thin compatibility shim until the next major version

### 0.2 — Create `audio_samples_education`

- New crate in the workspace: `crates/audio_samples_education/`
- Move `src/educational/` and `src/educational/processing.rs``audio_samples_education/src/`
- Move the `Explainable`, `ExplainMode`, `Explaining` re-exports
- `audio_samples` drops the `educational` feature
- Same thin shim strategy as above for compatibility

### 0.3 — Feature flags in the new crates

`audio_samples_plotting` feature flags:
- `transforms` — spectrogram types (mel, gammatone, CQT, chroma)
- `filters` — filter response plots (Bode, pole-zero, group delay)
- `static-plots` — static image export via `plotly_static`
- `html-view``.show()` opens in browser
- `wasm` — WASM-compatible rendering path (no filesystem, no browser-open)

`audio_samples_education` feature flags:
- `plotting` (required, on by default) — waveform/spectrogram panels per step
- `audio-playback` — inline base64 audio + Web Audio synced playhead
- `wasm-dsp` — compile DSP kernels to WASM for live parameter recomputation
- `offline` — inline CDN dependencies for offline use

---

## Phase 1 — Plotting primitives: scale and audio-native foundations

**Goal:** Make the plotting layer correct before making it pretty. Everything in this phase
is a prerequisite for phases 2–5.

### 1.1 — Peak/envelope decimation (highest priority of the entire roadmap)

A 3-minute stereo file at 48 kHz is ~17 M samples per channel. Handing that to Plotly freezes
the browser or silently renders garbage. The existing `decimate_waveform` with LTTB is
insufficient: LTTB is for line charts and will drop transient peaks. The correct algorithm for
waveforms is **min/max binning**: divide the signal into N bins (N = output pixel columns, ~2000
for a 1920-wide plot), and for each bin record `(min, max, rms)`. Draw each bin as a vertical
segment from min to max; overlay an RMS envelope. This is visually indistinguishable from the
full-resolution draw and preserves every transient.

- Replace `decimate_waveform` with `envelope_decimate(signal, n_bins) -> Vec<(f64, f64, f64)>`
  returning `(min, max, rms)` per bin
- Render each bin as a filled area trace (min–max) with an RMS line on top
- Make this automatic and zoom-aware: hook into Plotly's `relayout` event (via embedded JS),
  detect `xaxis.range` changes, recompute bins for the visible window, and push new trace data
  via `Plotly.restyle`. This makes zoom reveal more detail without ever re-sending the full
  signal to the DOM.
- Apply to both standalone waveforms and the educational before/after panels

### 1.2 — Design system: `PlotTheme` and baked Plotly defaults

- `pub enum PlotTheme { Midnight, Slate, Amber, Light, Custom(PlotThemeConfig) }`
- `PlotThemeConfig`: bg, surface, border, accent, accent2, text, text_muted, channel_colors
- Default theme: `Midnight` — matching the educational document so `.show()` and the
  educational doc share a visual language
- All `create_*_plot` functions accept a `PlotTheme` and apply it to the Plotly `Layout`
  (paper_bgcolor, plot_bgcolor, font color/family, gridcolor, linecolor, tickcolor)
- Define a canonical per-channel color palette per theme:
  `[#637aff, #38bdf8, #34d399, #f59e0b, #f87171, ...]` — the same values already in the
  educational CSS, now the single source of truth
- Wire up `FontSizes`: apply `font_sizes.title` to layout title, `font_sizes.axis_labels` to
  axis titles, `font_sizes.ticks` to tick fonts
- `Layout::auto_size(true)` on all plots by default

### 1.3 — Time axis formatting

Raw float seconds is wrong for audio. The time axis must:
- Format ticks as `mm:ss.mmm` for durations ≥ 1 s, `ss.mmm` for < 1 s
- Offer a toggle between wall-clock time and sample-index
- Use sufficiently fine tick density to be useful at both full-view and zoomed-in resolutions

### 1.4 — dBFS y-axis for waveform

`AudioStatistics` already has dBFS utilities. The waveform params should expose:
```rust
pub enum AmplitudeScale { Linear, DbFs }
```
When `DbFs`: convert samples via `20 * log10(|sample|)`, clamp at a configurable floor
(default -120 dBFS), label the axis accordingly. Useful for anything where dynamic range
matters more than waveform shape.

### 1.5 — Spectrogram quality

The existing spectrogram is underspecified on the axes most important to its legibility:

**Frequency axis options.** A linear-frequency spectrogram is close to useless for music and
speech: all the perceptually relevant structure is crammed into the bottom eighth of the axis.
Add a `FreqAxisScale` option:
```rust
pub enum FreqAxisScale { Linear, Log, Mel, Bark, Erb }
```
This is separate from the existing `SpectrogramType` (which controls the magnitude encoding and
spectrogram algorithm). `FreqAxisScale` controls only how the axis is labelled and the bins are
warped for display.

**Magnitude-to-color mapping.** Spectrograms must default to dB scale with:
- Explicit reference level (0 dBFS by convention)
- Configurable floor (default -80 dB)
- Top-dB clamp (default 0 dB)
- Perceptually uniform colormap (magma or viridis) as the default — not Plotly's default
  colorscale
The dynamic range clamp is the single biggest determinant of whether a spectrogram is readable.

**Colorbar with dB units.** The colorbar title should show "dB (re 0 dBFS)" with the floor and
ceiling values at the ends.

### 1.6 — Filter analysis plots

Phase 3.3 (educational) plans to add explain texts for `butterworth_lowpass/highpass`,
`chebyshev_i`, and `apply_iir_filter`. The natural visualization for a filter is not a
before/after waveform — it is the frequency response. These plotting primitives must exist in
`audio_samples_plotting` before 3.3 can be meaningful.

Required plot types:
- **`FilterResponsePlot`** — magnitude Bode plot (dB vs Hz, log x-axis) and phase Bode plot
  (degrees vs Hz), optionally combined in a two-row subplot
- **`PoleZeroPlot`** — unit circle with poles (×) and zeros (○) on the complex plane
- **`ImpulseResponsePlot`** — time-domain impulse/step response
- **`GroupDelayPlot`** — group delay (samples or ms) vs Hz

Compute from `IirFilter` coefficients directly: frequency-sample the transfer function
`H(e^jω)` at 1024 points across [0, Nyquist].

### 1.7 — CQT and chromagram as distinct plot types

`chromagram` and `mel_spectrogram` are listed in Phase 3.3 as operations to add explain texts
for, but they are not generic spectrograms. A chromagram has 12 pitch classes on the y-axis, a
circular pitch structure, and different colormap requirements (the pitch class axis wraps). A
CQT has logarithmically spaced frequency bins with a fixed number of bins per octave.

- **`ChromagramPlot`** — heatmap with pitch-class labels (C, C#, D, …, B), y-axis circularity
  implied by label order, viridis default colormap
- Add `CqtSpectrogram` as a `SpectrogramType` variant with the correct bin layout

### 1.8 — Annotations and labeled regions primitive

A reusable primitive used by onset/beat/segment visualization across all plot types:

```rust
pub struct TimeRegion {
    pub start: f64,            // seconds
    pub end: Option<f64>,      // None = point marker
    pub label: Option<String>,
    pub color: Option<String>,
}
```

All plot types that have a time axis accept `Vec<TimeRegion>` via a shared `add_regions` method.
This replaces the ad-hoc `add_vline`/`add_shaded_region` per-type duplication and is the
primitive for onset/beat/segment annotations on spectrograms (currently missing).

---

## Phase 2 — API simplification

**Goal:** Common cases need zero config; power cases stay possible.

### 2.1 — Zero-arg shortcuts on `AudioSamples`

```rust
audio.plot()          // -> AudioSampleResult<WaveformPlot>
audio.spectrogram()   // -> AudioSampleResult<SpectrogramPlot>
audio.spectrum()      // -> AudioSampleResult<MagnitudeSpectrumPlot>
```

Convenience wrappers on the existing trait methods with `PlotTheme::Midnight` and sensible
per-type defaults. No new behaviour, just less friction.

### 2.2 — Auto-computing overlay methods on `WaveformPlot`

Current pattern requires two steps: compute then add. Add high-level companions:

```rust
plot.with_rms_envelope(&audio)                       // sensible default window/hop
plot.with_rms_envelope_params(&audio, window, hop)   // explicit params
plot.with_peak_envelope(&audio)
plot.with_zcr_overlay(&audio)
plot.with_onset_markers(&audio, &config)             // compute + add
plot.with_beat_markers(&audio, &config)
```

The low-level `add_*` methods remain for users who pre-compute their own data.

### 2.3 — Replace `CompositePlot` iframe approach

The current implementation base64-encodes each sub-plot into an `<iframe>`, meaning sub-plots
are not interactive together (no shared hover, no linked zoom). Replace with Plotly's native
subplot system: each plot's traces are added to a shared `Plot` with grid/domain positioning.
`PlotComponent::requires_shared_x_axis()` already exists to guide this.

### 2.4 — Analysis dashboard

Built on the fixed `CompositePlot`:
```rust
audio.analysis_dashboard()  // -> AudioSampleResult<CompositePlot>
```
Default layout: waveform (with min/max envelope from 1.1) on top, mel-dB spectrogram in the
middle, magnitude spectrum at the bottom. Shared x (time) axis for waveform and spectrogram.

### 2.5 — Clean up reserved/unused fields

- `line_style: Option<String>` — implement (Plotly: solid/dash/dot/dashdot) or remove
- `window_type` in `MagnitudeSpectrumParams` — wire up for FFT windowing or remove
- `frame_position` — implement frame-based spectrum or remove
- Static export `width`/`height`/`scale` — expose as config on `PlotUtils::save`

---

## Phase 3 — More audio-native plot types

**Goal:** Cover the standard analysis views a serious audio library is expected to have.

### 3.1 — A/B and difference views

- **Waveform overlay**: render two `AudioSamples` on the same axes with distinct colors, a
  `difference` trace (A − B), and a legend. Entry point:
  `WaveformPlot::compare(audio_a, audio_b, params)`
- **Difference spectrogram**: compute both spectrograms, subtract in dB, render with a
  diverging colormap (blue=A louder, red=B louder, white=equal). Used in educational
  before/after at the spectral level.

### 3.2 — Loudness / metering over time

- LUFS momentary (400 ms), short-term (3 s), integrated (full file) per EBU R128
- True-peak (4× oversampled) per ITU-R BS.1770
- Loudness range (LRA)
- Plot type: `LoudnessMeterPlot` — time series of momentary and short-term LUFS with
  integrated as a horizontal reference line; dBTP overlay optional

### 3.3 — Stereo field visualization

- **`GoniometerPlot`** — L vs R Lissajous scatter, updated as a rolling window; reveals
  stereo width, phase issues, and mono compatibility
- **Inter-channel correlation over time** — windowed Pearson r between L and R; +1 = mono,
  0 = uncorrelated, -1 = out-of-phase
- **Mid/side decomposition view** — M = (L+R)/2, S = (L−R)/2, plotted as two waveforms

### 3.4 — Spectrogram overlays

Overlaying MIR output on a spectrogram is where analysis output becomes legible:
- f0/pitch track (add_pitch_track)
- Onset and beat markers (via `TimeRegion` from 1.8)
- Formant tracks (F1, F2, F3)
- Harmonic series lines from a detected f0

### 3.5 — Phase spectrogram and group delay

- Phase spectrogram: angle of complex STFT bins, rendered with a cyclic colormap (HSV/twilight)
- Instantaneous frequency: derivative of phase, more readable than raw phase
- Group delay: `−dφ/dω`, rendered as a separate heatmap or line overlay

---

## Phase 4 — Educational structural improvements

**Goal:** Make `audio_samples_education` robust, extensible, and honest about its dependencies.

### 4.1 — Replace raw-string parsing with `ExplanationData`

The current approach of encoding `[operation: Name]\n[formula: LaTeX]` into a plain string then
parsing with `strip_prefix` is the most fragile part of the system. Replace with:

```rust
pub struct ExplanationData {
    pub operation: String,
    pub formula_latex: Option<String>,
    pub prose: String,
    pub visual_type: VisualType,
    pub code: Option<String>,          // exact Rust call for this step
}

pub enum VisualType {
    Waveform,
    Spectrogram,
    FrequencyResponse,   // Bode plot — requires audio_samples_plotting filter plots
    Chromagram,          // requires ChromagramPlot
    Spectrum,            // magnitude spectrum overlay
    Difference,          // A/B difference view
    None,
}
```

`VisualType` is now wide enough for all operations targeted in 4.3. Each `explain_*` function
returns `ExplanationData` directly. The renderer consumes structured data, not parsed text.

### 4.2 — Remove the unsafe pointer cast

`render_visual_block` casts `*const dyn ExplainDisplay as *const AudioSamplesVisual`. This is
unsound. Fix by making `Explanation::visual` a concrete `Option<AudioSamplesVisual>` rather than
`Box<dyn ExplainDisplay>`. Since `audio_samples_education` will own both the `explainable`
integration and `AudioSamplesVisual`, this is straightforward.

### 4.3 — Extend explain texts to more operations

With `ExplanationData` (4.1) and the plotting primitives (Phase 1) in place, covering new
operations is mechanical:

- `AudioIirFiltering`: `butterworth_lowpass/highpass/bandpass`, `chebyshev_i`,
  `apply_iir_filter``VisualType::FrequencyResponse` (Bode) as the visual
- `AudioEditing`: `trim`, `pad`, `fade_in`, `fade_out`, `concatenate``VisualType::Waveform`
- `AudioTransforms`: `stft``VisualType::Spectrogram`; `mel_spectrogram``VisualType::Spectrogram`; `chromagram``VisualType::Chromagram`

### 4.4 — Statistics comparison block per step

Each card shows a before/after table below the formula:

```
           Before    After
Peak:      0.80      1.00
RMS:       0.32      0.40
Duration:  2.00 s    2.00 s
```

For spectral operations, also show spectral centroid and bandwidth.

### 4.5 — Show-the-code per step

The `code: Option<String>` field in `ExplanationData` (4.1) drives a per-card code block with
a copy button. Shows the exact `audio_samples` Rust call that produced the step. Makes the
document a reproduction recipe, not just an illustration.

### 4.6 — Spectral before/after overlay

The scalar stats table (4.4) cannot convey *where* in frequency an operation acted.
Add an optional magnitude-spectrum overlay card: before and after spectra on the same axes,
difference spectrum highlighted. Driven by `VisualType::Spectrum` and `VisualType::Difference`.

### 4.7 — Hover-definition glossary

Wrap DSP vocabulary (windowing, leakage, Nyquist, dBFS, LUFS, etc.) in `<span class="gloss">`
elements. A small JS tooltip shows a one-sentence definition on hover. The glossary is defined
once in the template; term highlighting is automatic via a lookup over known terms.

---

## Phase 5 — Educational UI and rich features

**Goal:** Turn the document from a static snapshot into a learning tool.

### 5.1 — Embedded audio playback with synced playhead (highest-value educational feature)

Encode the `AudioSamples` at each step as a base64 WAV `<audio>` element inlined in the HTML.
Wire a `timeupdate` listener to sweep a vertical playhead cursor across the waveform and
spectrogram panels in sync with playback. The before/after comparison cards become before/after
audio the user can A/B by ear. This is worth more than the sidebar, collapsible cards, and
linked brushing combined.

Depends on: `audio_samples_io` for WAV encoding (base64 inline data URI).

### 5.2 — Step navigation sidebar

For chains of 5+ operations: a sticky left sidebar with step names and operation types. Clicking
jumps to the card. The timeline structure is already in place; the sidebar is ~30 lines of JS.

### 5.3 — Collapsible cards

Cards expand/collapse by clicking the header. Collapsed state shows step number, operation
name, and a 1-line stats delta (`peak +25%, RMS +25%`). Makes long documents scannable.

### 5.4 — CSS/JS injection API

```rust
pub struct ExplainConfig {
    pub title: String,
    pub default_theme: PlotTheme,
    pub custom_css: Option<String>,
    pub custom_js: Option<String>,
}
```

`custom_css` overrides any CSS variable, color, or layout. Changing the accent colour or font
requires 2 lines. `ExplainConfig::new(title)` is the zero-friction entry point.
Replace `render_explanation_document(explanations, title)` with
`render_explanation_document(explanations, &ExplainConfig)`.

### 5.5 — Linked brush/selection

Dragging a region on the "before" waveform highlights the same time range on the "after"
waveform and seeks the audio playback to that region. ~50 lines of Plotly JS event handling.

### 5.6 — Precomputed parameter sweeps (cheap WASM alternative)

Render the operation at several parameter values and present as small multiples or a slider over
precomputed frames. Example: window-size sweep for STFT (128/512/2048/8192 samples) shown as
four side-by-side spectrograms. Most of the pedagogical value of live WASM at a fraction of the
complexity. This is the recommended stepping stone.

---

## Phase 6 — Ambitious / long-term

### 6.1 — WASM live recomputation

Compile DSP kernels to WASM (`wasm-bindgen`). Embed parameter controls (sliders, dropdowns) in
the educational card. Changing a slider reruns the operation in-browser and redraws the plots
without a Rust rebuild. This turns a static explainer into an interactive instrument.

Architecturally: DSP kernels in `audio_samples` are pure functions on slices — they compile to
WASM cleanly. `audio_samples_education` with the `wasm-dsp` feature links the WASM binary into
the HTML at document generation time.

### 6.2 — Offline self-contained output

`render_explanation_document` already produces CDN-linked HTML. Add an `ExplainConfig::offline`
flag that fetches and inlines KaTeX, Plotly, and fonts at document generation time. Works in
combination with 5.1 since audio is already inlined.

### 6.3 — Export formats

- PDF export via headless Chrome/Chromium (for reports and papers)
- PNG export of individual cards (for slides)
- Jupyter notebook export (`.ipynb`) so the explanation chain becomes a reproducible notebook

---

## Dependency order

```
Phase 0  (crate extraction)
   │
   ├──► Phase 1  (plotting primitives — scale, theme, spectrogram quality, filter plots, chroma)
   │         │
   │         └──► Phase 2  (API simplification, analysis dashboard)
   │                   │
   │                   └──► Phase 3  (more plot types: A/B, loudness, stereo field, phase)
   │
   └──► Phase 4  (educational structure — depends on Phase 1 for VisualType primitives)
             └──► Phase 5  (educational UI — depends on Phase 4 for ExplainConfig/ExplanationData)
                       └──► Phase 6  (WASM, offline, export — depends on Phase 5)
```

Phases 1 and 4 share a hard dependency on Phase 0 (crate extraction) but are otherwise
independent of each other and can proceed in parallel once the crate boundaries are established.
Phase 3 requires Phase 2. Phase 5 requires Phase 4. Phase 6 requires Phase 5.

The decimation work (1.1) and the filter/chromagram plot types (1.6, 1.7) must land before
Phase 4 begins: `VisualType::FrequencyResponse` and `VisualType::Chromagram` in 4.1 are hollow
until those primitives exist. Do not start Phase 4 operations that use those variants until the
corresponding Phase 1 items are complete.
```