audio_samples 0.10.5

A typed audio processing library for Rust that treats audio as a first-class, invariant-preserving object rather than an unstructured numeric buffer.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
# AudioSamples Architecture

This document provides detailed information on how the audio_samples crate is architected from its building blocks up to the highest-level API.
The architecture follows core principles of **type safety**, **zero-allocation efficiency**, **trait-based composition**, and **modular feature design**.

## Core Design Principles

### 1. Type Safety Through Strong Typing

All audio data is strongly typed with the sample format (`i16`, `I24`, `i32`, `f32`, `f64`), ensuring mathematical operations are performed with appropriate precision and range.
The type system prevents common audio processing errors like mixing incompatible sample formats.

### 2. Zero-Allocation Efficiency

The library leverages `ndarray`'s view system to enable zero-allocation access patterns wherever possible.
Operations prefer in-place modifications and views over copying data.
The library adds several wrappers, ``MonoRepr/MultiRepr`` and ``MonoData/MultiData``, around `ndarray` arrays to facilitate owned and borrowed data automatically.
 The end user should never have to deal with these wrappers.
The top layer of the wrappers is the ``AudioSamples<'_, T: AudioSample>`` struct.
This is the entry point for a user and the impl blocks for ``AudioSamples`` connect to the right backend wrapper.

### 3. Trait-Based Composition

Functionality is organized into focused, composable traits rather than monolithic implementations.
Each trait handles a specific aspect of audio processing with clear separation of concerns.
Traits also allow for easy feature-gating.

### 4. Metadata Integration

Audio samples are always paired with essential metadata (sample rate, channel layout) to prevent common audio processing errors and enable automatic format conversions.

### 5. Feature-Gated Modularity

The library uses cargo features extensively to keep dependencies minimal, allowing users to enable only the functionality they need.

## Building Blocks

### AudioSample (./traits.rs)

The `AudioSample` trait is the foundation of the entire type system.
It defines the interface for all supported audio sample formats:

**Supported Types:**

- `i16`: 16-bit signed integer samples (most common for audio files)
- `I24`: 24-bit signed integer samples (professional audio) -- **Always use the re-export of ``I24`` from the crate.**
- `i32`: 32-bit signed integer samples (high precision)
- `f32`: 32-bit floating-point samples (normalized -1.0 to 1.0)
- `f64`: 64-bit floating-point samples (highest precision)

**Key Requirements:**

- Standard arithmetic operations (`Add`, `Sub`, `Mul`, `Div`)
- Memory safety guarantees (`NoUninit` for safe byte serialization)
- Numeric operations (`Num`, `Zero`, `One`, `Signed`)
- Serialization support (`Serialize`, `Deserialize`)
- Constants for range information (`MAX`, `MIN`, `BITS`)

**API Contracts:**

- All sample types must provide consistent arithmetic behavior
- Byte serialization must be safe and deterministic
- Range constants must accurately represent the format's dynamic range

### ConvertTo (./traits.rs)

The `ConvertTo<T>` trait provides audio-aware conversions between different sample formats with proper scaling:

**Conversion Behavior:**

- **Integer ↔ Integer**: Bit-shift scaling to preserve full dynamic range
- **Integer ↔ Float**: Normalized scaling (-1.0 to 1.0 for floats)
- **Float ↔ Float**: Direct casting with precision conversion
- **I24 Special Handling**: Custom methods for 24-bit operations

**Design Patterns:**

- Returns `T`, designed not to fail.
- Uses macro-generated implementations for consistency and to cut down on manual code.
- Maintains mathematical precision across format boundaries
- Handles edge cases like range overflows gracefully

**API Contracts:**

- Conversions must preserve audio dynamic range proportionally
- Round-trip conversions should minimize precision loss
- Error cases must be clearly documented and handled

### CastInto, CastFrom and Castable (./traits.rs)

The casting trait family provides raw numeric conversions without audio-specific scaling:

**Purpose:**

- Raw numeric casting for non-audio operations
- Direct type conversions without range normalization
- Performance-critical paths where scaling is not needed

**Design Patterns:**

Sometimes you just need to cast an int to a float and back again.
These traits **DO NOT** perform any audio-specific scaling/conversions.

- `CastFrom<S>`: Cast from source type to Self
- `CastInto<T>`: Cast self into target type
- `Castable`: Marker trait for types that can cast to all audio formats

**API Contracts:**

- Casting preserves numeric values without audio scaling
- Out-of-range values are clamped to target type's limits
- No error handling - assumes well-formed input.

If something like this fails then things are bad.

### AudioSamples<'_, T: AudioSample> (repr.rs)

The main data container that combines audio samples with essential metadata:

```rust
pub struct AudioSamples<'a, T: AudioSample> {
    pub data: AudioData<'a, T>,
    pub sample_rate: u32,
    pub layout: ChannelLayout,
}
```

**Key Features:**

- Generic over any `AudioSample` type
- Lifetime parameter `'a` enables zero-copy views
- Always includes sample rate and channel layout
- Provides uniform interface for mono and multi-channel audio

**Memory Layout:**

- Mono audio: 1D arrays via `MonoData<'a, T>`
- Multi-channel audio: 2D arrays via `MultiData<'a, T>` with channels as rows
- Both support borrowed and owned data

**API Contracts:**

- Sample rate must be positive
- Channel layout must match data dimensions
- Lifetime safety ensured through Rust's borrow checker.
But at the user level of the API, unless they are really concerned with lifetime management and reuse in their program, lifetimes should not be a concern.
- Metadata consistency maintained across operations

### AudioSamples Iteration (iterators.rs)

Provides multiple iteration patterns for efficient audio processing:

**Iterator Types:**

- `frames()`: Iterate by frames (one sample from each channel)
- `channels()`: Iterate by complete channels
- `windows(size, hop)`: Windowed iteration with configurable overlap
- Support for different padding modes: `Zero`, `None`, `Skip`

**Design Patterns:**

- Zero-allocation views where possible
- Configurable windowing for FFT and analysis operations
- Type-safe iterator adaptors
- Memory-efficient streaming for large audio files

**API Contracts:**

- Iterator stability guaranteed for immutable operations
- Window boundaries handled consistently
- Padding modes clearly defined and documented

### AudioSamples Conversion (conversions.rs)

Implements the `AudioTypeConversion` trait for safe type transformations on ``AudioSamples``:

**In-Domain Conversions:**

- `to_format<O>()`: Borrows original, returns new type
- `to_type<O>()`: Consumes original, returns new type
- Convenience methods: `as_f32()`, `as_i16()`, `as_i24()`, etc.
- Uses `ConvertTo` trait for audio-aware scaling

**Out-of-Domain Conversions:**

- `cast_as<O>()`: Borrows original, raw numeric casting
- `cast_to<O>()`: Consumes original, raw numeric casting
- Uses `CastFrom` trait for direct numeric conversion

**API Contracts:**

- Clear distinction between audio-aware and raw conversions
- Lifetime management ensures memory safety
- Type bounds enforce conversion compatibility

### Utilities (./utils)

Provides supporting functionality organized by purpose:

- `generation.rs`: Signal generation (sine waves, noise, etc.)
- `detection.rs`: Feature detection algorithms
- `comparison.rs`: Audio comparison and similarity metrics

In future, this may become a more general ``algorithms`` module.

**Design Patterns:**

- Pure functions where possible
- Consistent error handling patterns
- Performance-optimized implementations
- Feature-gated advanced functionality

### Errors (./error.rs)

Comprehensive error handling with specific error types:

```rust
pub enum AudioSampleError {
    ConversionError(String, String, String, String),
    InvalidRange(String),
    InvalidParameter(String),
    DimensionMismatch(String),
    InvalidInput { msg: String },
    ProcessingError { msg: String },
    FeatureNotEnabled { feature: String },
    ArrayLayoutError { message: String },
    OptionError { message: String },
    BorrowedDataError { message: String },
    InternalError(String),
}
```

Needs reviewing though.
Consistency in the error messages would be nice and maybe split things into more granular errors.

**Error Handling Strategy:**

- Specific error types for different failure modes
- Rich context information in error messages
- Integration with `thiserror` for ergonomic handling
- `AudioSampleResult<T>` type alias for consistency

## Trait Extensions

### The `operations` Module (./operations)

The operations module organizes audio processing functionality into focused, composable traits.
Each trait addresses a specific domain of audio processing:

#### Core Organization

- **`traits.rs`**: Trait definitions and type bounds
- **Implementation files**: Named after traits (e.g., `AudioChannelOps``channels.rs`)
- **Supporting types**: Enums and structs in `types.rs`
- **Feature gating**: Advanced functionality behind cargo features

#### `AudioStatistics` (statistics.rs)

**Purpose**: Statistical analysis operations for audio data.

**Why use it?**: Essential for audio analysis, level monitoring, and processing decisions.

**How to use it?**:

```rust
let peak = audio.peak();          // Returns T directly
let rms: f64 = audio.rms();      // Returns f64
let crossings = audio.zero_crossings(); // Signal analysis
```

**Key Operations**:

- `peak()`, `min_sample()`, `max_sample()`: Level measurements
- `mean()`, `variance()`, `std_dev()`: Statistical measures
- `rms()`: Perceptually relevant loudness measure
- `zero_crossings()`, `zero_crossing_rate()`: Periodicity analysis
- `autocorrelation()`, `cross_correlation()`: Correlation analysis (requires `fft`)
- `spectral_centroid()`: Brightness measure (requires `fft`)

**API Contracts**:

- Leverages invariants provided by AudioSamples around non-empty audio to guarantee a value.
- Generic over float types for numerical operations
- Consistent behavior across mono and multi-channel audio

#### `AudioProcessing` (processing.rs)

**Purpose**: Core signal processing operations with fluent builder API.

**Why use it?**: Fundamental audio modifications like normalization, scaling, filtering.

**How to use it?**:

```rust
// Individual operations
audio.normalize(-1.0, 1.0, NormalizationMethod::Peak)?;
audio.scale(0.8)?;

// Fluent builder API
audio.processing()
    .normalize(-1.0, 1.0, NormalizationMethod::Peak)
    .scale(0.8)
    .clip(-0.5, 0.5)
    .apply()?;
```

**Key Operations**:

- `normalize()`: Multiple normalization strategies
- `scale()`, `gain()`: Amplitude adjustments
- `clip()`: Hard limiting to prevent overflows
- `remove_dc_offset()`: DC bias removal
- `fade_in()`, `fade_out()`: Envelope operations

**ProcessingBuilder Pattern**:

- Chains operations efficiently
- Validates parameters before application
- Atomic application (all or nothing)
- Memory-efficient operation sequencing

#### `AudioTransforms` (transforms.rs)

**Purpose**: Frequency-domain analysis and transformations (requires `fft`).

**Why use it?**: Spectral analysis, filtering, and frequency-domain processing.

**How to use it?**:

```rust
let spectrum = audio.fft()?;
let stft = audio.stft(window_size, hop_size)?;
let filtered = audio.spectral_filter(cutoff_freq)?;
```

**Key Operations**:

- `fft()`, `ifft()`: Fast Fourier Transform and inverse
- `stft()`, `istft()`: Short-Time Fourier Transform
- `spectrogram()`: Time-frequency representation
- `spectral_filter()`: Frequency domain filtering
- `phase_vocoder()`: Time/pitch manipulation

**Performance Considerations**:

- Uses `RustFFT` by default
- Optional Intel MKL backend (`mkl` feature)
- Real-valued FFT optimization for audio signals
- Memory-efficient windowing operations

#### `AudioEditing` (editing.rs)

**Purpose**: Time-domain editing and manipulation operations.

**Why use it?**: Audio arrangement, timing modifications, and content editing.

**How to use it?**:

```rust
let trimmed = audio.trim_start_end(1.0, 2.0)?;  // Remove first 1s, last 2s
let padded = audio.pad(PadSide::Both, 0.5)?;   // Add 0.5s silence
let reversed = audio.reverse()?;                // Reverse audio
let combined = audio1.concatenate(&audio2)?;   // Join audio
```

**Key Operations**:

- `trim()`, `trim_start_end()`: Remove audio segments
- `pad()`: Add silence or repeat edge samples
- `reverse()`: Reverse audio timeline
- `concatenate()`: Join multiple audio clips
- `repeat()`: Loop audio content

**API Contracts**:

- Time-based operations accept seconds or samples
- Sample rate consistency enforced across operations
- Memory-efficient implementation using views where possible

#### `AudioChannelOps` (channels.rs)

**Purpose**: Channel manipulation and spatial audio operations.

**Why use it?**: Mono/stereo conversion, channel mixing, spatial processing.

**How to use it?**:

```rust
let mono = stereo_audio.to_mono(MonoConversionMethod::Average)?;
let stereo = mono_audio.to_stereo(StereoConversionMethod::Duplicate)?;
let extracted = multi_audio.extract_channel(0)?;
```

**Key Operations**:

- `to_mono()`: Multiple mono conversion strategies
- `to_stereo()`: Stereo expansion from mono
- `extract_channel()`: Extract a specific channel.
allocates a new ``AudioSamples`` for the extracted channel
- `borrow_channel()`: Borrows a specific channel.
Does not allocate a new ``AudioSamples`` for the extracted channel, it just borrows the data.
- `mix_channels()`: Custom channel mixing
- `swap_channels()`: Channel reordering

**Conversion Methods**:

- **Mono**: `Average`, `Left`, `Right`, `Sum`, `WeightedSum`
- **Stereo**: `Duplicate`, `Spread`, `Custom`

#### `AudioIirFiltering` (iir_filtering.rs)

**Purpose**: Infinite Impulse Response filter implementations.

**Why use it?**: Real-time filtering, tone shaping, frequency response control.

**How to use it?**:

```rust
let filtered = audio.lowpass_filter(cutoff_hz, q_factor)?;
let shaped = audio.highpass_filter(cutoff_hz, q_factor)?;
```

**Filter Types**:

- `lowpass_filter()`, `highpass_filter()`: Basic frequency separation
- `bandpass_filter()`, `bandstop_filter()`: Band-limited filtering
- `butterworth_filter()`: Smooth response filters

#### `AudioParametricEq` (parametric_eq.rs)

**Purpose**: Multi-band parametric equalization.

**Why use it?**: Tone shaping, frequency response correction, creative EQ.

**How to use it?**:

```rust
let eq = ParametricEq::new()
    .add_band(EqBand::new(1000.0, 2.0, 3.0)) // +3dB at 1kHz
    .add_band(EqBand::new(5000.0, 1.0, -6.0)); // -6dB at 5kHz

let equalized = audio.apply_parametric_eq(&eq)?;
```

**Key Features**:

- Multiple simultaneous frequency bands
- Independent gain, frequency, and Q controls
- Real-time parameter updates
- Efficient cascaded biquad implementation

#### `AudioDynamicRange` (dynamic_range.rs)

**Purpose**: Dynamic range processing (compression, limiting, expansion).

**Why use it?**: Level control, dynamics processing, mastering operations.

**How to use it?**:

```rust
let compressed = audio.compressor(CompressorConfig::new(
    threshold: -12.0,
    ratio: 4.0,
    attack: 0.003,
    release: 0.1
))?;

let limited = audio.limiter(LimiterConfig::new(-1.0, 0.001, 0.05))?;
```

**Processor Types**:

- `compressor()`: Reduce dynamic range above threshold
- `limiter()`: Hard limiting to prevent peaks
- `expander()`: Increase dynamic range below threshold
- `gate()`: Remove low-level noise

#### Audio Resampling (resampling.rs)

**Purpose**: High-quality sample rate conversion (requires `resampling`).

**Why use it?**: Sample rate conversion, format compatibility, anti-aliasing.

**How to use it?**:

```rust
let resampled = audio.resample(48000, ResamplingQuality::VeryHigh)?;
```

**Quality Levels**:

- `Fast`: Quick conversion with acceptable quality
- `Medium`: Balanced quality and performance
- `High`: High-quality anti-aliasing
- `VeryHigh`: Maximum quality for critical applications

### Plotting (./operations/plotting)

**Purpose**: Comprehensive audio visualization capabilities (requires `plotting`).

**Architecture**:

- **Composable API**: Build complex plots from simple elements
- **Builder Pattern**: Fluent configuration of plot appearance
- **Multiple Backends**: Plotly for interactive plots, static generation support

**Core Components**:

#### `PlotComposer` (composer.rs)

Orchestrates the creation of complex, multi-element plots:

```rust
let plot = PlotComposer::new()
    .add_waveform(&audio, "Waveform")
    .add_spectrogram(&audio, window_size)
    .add_onsets(&onset_times)
    .set_layout(layout_config)
    .build()?;
```

#### Plotting Elements (elements.rs)

- `Waveform`: Time-domain amplitude plots
- `Spectrogram`: Time-frequency representations
- `Spectrum`: Frequency-domain magnitude plots
- `OnsetMarkers`: Event detection visualization
- `BeatMarkers`: Tempo and rhythm visualization
- `PitchContour`: Fundamental frequency tracking

#### Styling System (builders.rs)

- `ColorPalette`: Consistent color schemes
- `LineStyle`: Customizable line appearance
- `MarkerStyle`: Point and event markers
- `LayoutConfig`: Plot layout and formatting

**Feature Integration**:

- Automatic sample rate handling for time axis
- Frequency axis scaling for spectral plots
- Real-time plot updates for streaming audio
- Export capabilities (PNG, SVG, HTML)

**API Contracts**:

- Consistent time/frequency axis handling
- Memory-efficient plot generation
- Thread-safe plot composition
- Graceful fallbacks for missing features

## API Design Contracts

### Error Handling Strategy

1. **Simple operations never fail**: Basic getters return values directly
2. **Complex operations return Results**: Type conversions, processing operations
3. **Rich error context**: Specific error types with detailed messages
4. **Graceful degradation**: Optional features disabled cleanly

### Memory Management

1. **Zero-allocation views**: Borrow AudioSamples when possible
2. **In-place operations**: Prefer modification over copying
3. **Owned data when needed**: Automatic conversion from borrowed to owned
4. **Memory safety**: Rust's ownership system prevents data races

### Type System Guarantees

1 .**Sample format safety**: Strong typing prevents format mix-ups
2. **Lifetime correctness**: Borrowed data cannot outlive its source
3. **Feature consistency**: Trait bounds enforce feature requirements
4. **Conversion validity**: Type-safe conversions with error handling

### Performance Characteristics

1. **Predictable allocation**: Clearly documented allocation behavior
2. **SIMD optimization**: Automatic vectorization where beneficial
3. **Cache efficiency**: ndarray layouts optimized for memory access
4. **Scalable algorithms**: Linear complexity for core operations