stratum-dsp 1.0.0

Professional-grade audio analysis engine for DJ applications: BPM detection, key detection, and beat tracking
Documentation

Stratum DSP

Crates.io Docs.rs License: MIT OR Apache-2.0 Rust

Professional-grade audio analysis engine for DJ applications — pure Rust, zero FFI dependencies, production-ready BPM detection, key detection, and beat tracking.

Features

  • Tempo (BPM) detection with confidence scoring
    • Dual tempogram approach (FFT + autocorrelation) based on Grosche et al. (2012)
    • Multi-resolution escalation for metrical-level disambiguation
    • 87.7% accuracy within ±2 BPM on real-world DJ tracks (155 Beatport/ZipDJ tracks)
  • Key detection with musical notation + numerical DJ notation (1A/1B)
    • Chroma-based analysis with Krumhansl-Kessler template matching
    • Supports both major/minor keys
  • Beat grid generation with stability scoring
    • HMM-based beat tracking with tempo drift correction
  • Preprocessing: Peak/RMS/LUFS normalization (ITU-R BS.1770-4) + silence trimming
  • Batch processing: Parallel analysis with CPU-1 workers (7.7× throughput improvement)

Installation

From crates.io

Add to your Cargo.toml:

[dependencies]

stratum-dsp = "1.0"

From Git (Development)

Add to your Cargo.toml:

[dependencies]

stratum-dsp = { git = "https://github.com/HLLMR/stratum-dsp" }

Note: The API is stable for core features (BPM/key/beat-grid). Advanced tuning parameters may change in future versions.

Quick Start

Library Usage

use stratum_dsp::{analyze_audio, compute_confidence, AnalysisConfig};

// Load your audio samples (mono, f32 in [-1, 1])
let samples: Vec<f32> = vec![]; // Your audio data
let sample_rate = 44_100;

// Analyze
let result = analyze_audio(&samples, sample_rate, AnalysisConfig::default())?;
let conf = compute_confidence(&result);

println!("BPM: {:.2} (confidence: {:.2})", result.bpm, conf.bpm_confidence);
println!("Key: {} ({})", result.key.name(), result.key.numerical()); // e.g., "C major (1A)", "Am (1B)"
println!("Beat grid stability: {:.2}", result.grid_stability);
# Ok::<(), stratum_dsp::AnalysisError>(())

CLI Examples

Single file analysis:

cargo build --release --example analyze_file

target/release/examples/analyze_file --json <audio_file>

Batch processing (parallel):

cargo build --release --example analyze_batch

target/release/examples/analyze_batch --jobs 7 <file1> <file2> ...

Validation Results

Stratum DSP has been validated on real-world DJ tracks from Beatport and ZipDJ (155 tracks with verified ground truth):

Metric Result
BPM accuracy (±2 BPM) 87.7% (136/155 tracks)
BPM accuracy (±5 BPM) 88.4% (137/155 tracks)
BPM accuracy (±10 BPM) 89.0% (138/155 tracks)
BPM MAE 6.08 BPM
Key accuracy 72.1% exact match vs GT (68 tracks) — matches MIK performance

Dataset: 155 verified DJ tracks (Beatport/ZipDJ) with ground truth BPM/key from vendor tags (pre-MIK snapshot).
Reference baseline: Mixed-in-Key (TAG) achieves 98.1% ±2 BPM and 72.1% key accuracy on the same dataset.

Note: FMA Small dataset results (used for algorithm development) show lower accuracy (56.7% ±2 BPM) due to dataset diversity. Real-world DJ tracks achieve production-grade accuracy as shown above.

For detailed validation reports, see:

  • docs/progress-reports/PHASE_1F_VALIDATION.md (FMA Small development results + real-world DJ results)
  • docs/literature/stratum_2025_key_detection_real_world.md (key detection improvements)
  • validation/README.md (validation workflow)

Performance

Single-Track Processing

  • ~200–210 ms for a 3-minute track (synthetic benchmark, Criterion.rs)
  • ~150–350 ms typical range for real-world tracks (varies by track length and complexity)

Batch Throughput

  • Single worker: ~2.8 tracks/sec
  • CPU-1 workers (parallel): ~21.3 tracks/sec (7.7× speedup)

Example: Processing 1000 tracks takes ~6 minutes with CPU-1 workers (vs ~45 minutes single-threaded).

For detailed benchmark reports, see docs/progress-reports/PHASE_1F_BENCHMARKS.md.

Known Limitations

BPM Detection

  • Metrical-level errors: ~12% of tracks show octave/harmonic confusions (1/2×, 2×, 2/3× ratios)
    • Common on double-time feels, triplet-based tracks, or tracks with weak kick-snare patterns
    • Multi-resolution escalation helps but doesn't eliminate all cases
  • Very low BPM (<60 BPM): May be confused with half-tempo if the track has strong sub-harmonic content

Key Detection

  • Current accuracy: 72.1% exact match vs ground truth (matches Mixed In Key performance)
  • Known issue: Some tracks with weak tonality or heavy percussion may have lower confidence
  • Workaround: Key confidence scores indicate reliability; low confidence (<0.3) suggests the result may be unreliable

General

  • CPU-only: No GPU acceleration (Phase 2 will add optional ML refinement)
  • Fixed sample rate: Optimized for 44.1 kHz (other rates work but may have slight accuracy differences)

Algorithm References

  • BPM Detection: Grosche, P., Müller, M., & Kurth, F. (2012). Cyclic Tempogram—A Mid-Level Tempo Representation for Music Signals. ICASSP.
  • Key Detection: Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the Dynamic Changes in Perceived Tonal Organization in a Spatial Representation of Musical Keys. Psychological Review.
  • Beat Tracking: Ellis, D. P. W. (2007). Beat Tracking by Dynamic Programming. Journal of New Music Research.
  • Preprocessing: ITU-R BS.1770-4 (Loudness normalization)

Full literature review: docs/literature/

Documentation

  • Pipeline (authoritative): PIPELINE.md — exact runtime flow and decision points
  • Development workflow: DEVELOPMENT.md — build, test, benchmark, validate
  • Roadmap: ROADMAP.md — high-level milestones
  • Progress reports: docs/progress-reports/ — detailed phase histories and tuning logs
  • Validation: validation/README.md — how to reproduce validation results

Contributing

Contributions are welcome! Please see the documentation in docs/ for algorithm details and DEVELOPMENT.md for build/test workflows.

Note: This project follows a phased development approach. Phase 1 (DSP-first) is focused on core tempo/key/beat-grid accuracy. Phase 2 will add optional ML refinement.

License

Dual-licensed under MIT OR Apache-2.0 at your option.