stem-splitter-core 1.2.0

Core library for AI-powered audio stem separation
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
# 🎡 Stem Splitter Core

**High-performance, pure-Rust audio stem separation library powered by ONNX Runtime**

[![Crates.io](https://img.shields.io/crates/v/stem-splitter-core.svg)](https://crates.io/crates/stem-splitter-core)
[![License](https://img.shields.io/crates/l/stem-splitter-core.svg)](LICENSE-MIT)

---

## 🎧 Overview

`stem-splitter-core` is a Rust library for splitting audio tracks into isolated stems (vocals, drums, bass, and other instruments) using state-of-the-art AI models. Built entirely in Rust with ONNX Runtime, it provides:

- **No Python dependency** - Pure Rust implementation
- **High-quality separation** - Uses the Hybrid Transformer Demucs (htdemucs) model
- **Automatic model management** - Downloads and caches models with registry support
- **Fast inference** - Optimized ONNX Runtime with GPU acceleration and multi-threading
- **Progress tracking** - Built-in callbacks for download and processing progress
- **Production-ready** - Memory-safe, performant, and battle-tested

Perfect for music production tools, DJ software, karaoke apps, or any application requiring audio source separation.

---

## ✨ Features

- 🎡 **4-Stem Separation** β€” Isolate vocals, drums, bass, and other instruments
- 🧠 **State-of-the-art AI** β€” Hybrid Transformer Demucs model (htdemucs)
- πŸš€ **GPU Acceleration** β€” CUDA, CoreML, DirectML, oneDNN, and XNNPACK support (auto-detected)
- πŸ“¦ **Model Registry** β€” Built-in model registry with support for multiple models
- 🎚️ **Multiple Formats** β€” Supports WAV, MP3, FLAC, OGG, and more via Symphonia
- πŸ“Š **Progress Tracking** β€” Real-time callbacks for download and split progress
- πŸ”’ **Type-safe** β€” Strong compile-time guarantees with Rust's type system
- πŸ’Ύ **Smart Caching** β€” Models cached in user directories with SHA-256 verification

---

## πŸ”§ CLI & Distribution

While `stem-splitter-core` is primarily a Rust library, this repository also provides a
first-party CLI (`stem-splitter`) and prebuilt binaries for common platforms.

### CLI

The CLI is built on top of `stem-splitter-core` and exposes the same high-performance
audio stem separation features via the command line.

The CLI source lives in:

src/bin/stem-splitter.rs

### Prebuilt Binaries

Prebuilt binaries are published with each GitHub release:

https://github.com/gentij/stem-splitter-core/releases

These binaries are suitable for:
- Arch Linux (via AUR)
- Debian / Ubuntu (manual install)
- Any glibc-based Linux distribution

### Platform Packages

- macOS: Homebrew
- Arch Linux: AUR (`stem-splitter-bin`)
- Linux (generic): tar.gz binary from GitHub Releases

See the `packaging/` directory for reference packaging files.

---

## πŸ“¦ Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
stem-splitter-core = "1.0.0"
```

### System Requirements

- **Rust 1.70+**
- **~200MB disk space** for model storage (first run only)
- **4GB+ RAM** recommended for processing

No external dependencies or Python installation required!

---

## πŸš€ Quick Start

### Basic Usage

```rust
use stem_splitter_core::{split_file, SplitOptions};

fn main() -> anyhow::Result<()> {
    // Configure the split operation
    let options = SplitOptions {
        output_dir: "./output".to_string(),
        model_name: "htdemucs_ort_v1".to_string(),
        manifest_url_override: None,
    };

    // Split the audio file
    let result = split_file("song.mp3", options)?;

    // Access the separated stems
    println!("Vocals: {}", result.vocals_path);
    println!("Drums: {}", result.drums_path);
    println!("Bass: {}", result.bass_path);
    println!("Other: {}", result.other_path);

    Ok(())
}
```

Or even simpler with defaults:

```rust
use stem_splitter_core::{split_file, SplitOptions};

fn main() -> anyhow::Result<()> {
    // Use default options (htdemucs_ort_v1 model, current directory)
    let result = split_file("song.mp3", SplitOptions::default())?;
    
    println!("Vocals: {}", result.vocals_path);
    Ok(())
}
```

### With Progress Tracking

```rust
use stem_splitter_core::{split_file, SplitOptions, SplitProgress};

fn main() -> anyhow::Result<()> {
    // Set download progress callback
    stem_splitter_core::set_download_progress_callback(|downloaded, total| {
        let percent = if total > 0 {
            (downloaded as f64 / total as f64 * 100.0) as u64
        } else {
            0
        };
        if total > 0 {
            eprint!("\rDownloading model… {}% ({}/{} bytes)", percent, downloaded, total);
            if downloaded >= total {
                eprintln!();
            }
        }
    });

    // Set split progress callback
    stem_splitter_core::set_split_progress_callback(|progress| {
        match progress {
            SplitProgress::Stage(stage) => {
                eprintln!("> Stage: {}", stage);
            }
            SplitProgress::Writing { stem, percent, .. } => {
                eprintln!("Writing {}: {:.0}%", stem, percent);
            }
            SplitProgress::Finished => {
                eprintln!("Split finished!");
            }
            _ => {}
        }
    });

    let options = SplitOptions {
        output_dir: "./output".to_string(),
        ..Default::default()  // Uses htdemucs_ort_v1 by default
    };

    split_file("song.mp3", options)?;
    Ok(())
}
```

### Pre-loading Models

For applications that need to minimize latency, pre-load the model:

```rust
use stem_splitter_core::prepare_model;

fn main() -> anyhow::Result<()> {
    // Download and load model at startup
    prepare_model("htdemucs_ort_v1", None)?;
    
    // Now splitting will be instant (no download delay)
    // ... use split_file() as normal
    
    Ok(())
}
```

---

## πŸ“– API Reference

### `split_file(input_path: &str, opts: SplitOptions) -> Result<SplitResult>`

Main function to split an audio file into stems.

**Parameters:**
- `input_path`: Path to the audio file (supports WAV, MP3, FLAC, OGG, etc.)
- `opts`: Configuration options (see `SplitOptions`)

**Returns:**
- `SplitResult` containing paths to the separated stem files

### `SplitOptions`

Configuration struct for the separation process.

```rust
pub struct SplitOptions {
    /// Directory where output stems will be saved
    pub output_dir: String,
    
    /// Name of the model to use (e.g., "htdemucs_ort_v1")
    pub model_name: String,
    
    /// Optional: Override the model manifest URL
    /// (useful for custom models or specific versions)
    pub manifest_url_override: Option<String>,
}
```

**Default values:**
- `output_dir`: `"."`
- `model_name`: `"htdemucs_ort_v1"`
- `manifest_url_override`: `None`

### `SplitResult`

Result struct containing paths to the separated stems.

```rust
pub struct SplitResult {
    pub vocals_path: String,
    pub drums_path: String,
    pub bass_path: String,
    pub other_path: String,
}
```

### `prepare_model(model_name: &str, manifest_url_override: Option<&str>) -> Result<()>`

Pre-loads and caches a model for faster subsequent splits.

**Parameters:**
- `model_name`: Name of the model to prepare
- `manifest_url_override`: Optional URL to override the manifest location

### `ensure_model(model_name: &str, manifest_url_override: Option<&str>) -> Result<ModelHandle>`

Downloads and verifies a model, returning a handle with metadata.

**Parameters:**
- `model_name`: Name of the model to ensure
- `manifest_url_override`: Optional URL to override the manifest location

**Returns:**
- `ModelHandle` containing the manifest and local path to the model

### `set_download_progress_callback(callback: F)`

Set a callback to track model download progress.

```rust
pub fn set_download_progress_callback<F>(callback: F)
where
    F: Fn(u64, u64) + Send + 'static,
```

**Callback parameters:**
- `downloaded`: Bytes downloaded so far
- `total`: Total bytes to download (0 if unknown)

### `set_split_progress_callback(callback: F)`

Set a callback to track split processing progress.

```rust
pub fn set_split_progress_callback<F>(callback: F)
where
    F: Fn(SplitProgress) + Send + 'static,
```

**SplitProgress variants:**
- `Stage(&'static str)`: Current processing stage (e.g., "resolve_model", "read_audio", "infer")
- `Chunks { done, total, percent }`: Progress through audio chunks
- `Writing { stem, done, total, percent }`: Progress writing a specific stem
- `Finished`: Processing complete

---

## 🎯 Supported Audio Formats

The library supports a wide range of audio formats through the [Symphonia](https://github.com/pdeljanov/Symphonia) decoder:

- **WAV** - Uncompressed audio (best quality)
- **MP3** - MPEG Layer 3
- **FLAC** - Free Lossless Audio Codec
- **OGG Vorbis** - Open-source lossy format
- **AAC** - Advanced Audio Coding
- And more...

**Output Format:** All stems are saved as 16-bit PCM WAV files at 44.1kHz stereo.

---

## 🧠 Model Information

### HTDemucs-ORT (htdemucs_ort_v1)

This is the default and currently supported model:

- **Architecture:** Hybrid Transformer Demucs
- **Format:** ONNX Runtime optimized
- **Size:** ~200MB (~209MB to be precise)
- **Quality:** State-of-the-art separation quality
- **Sources:** 4 stems (drums, bass, other, vocals)
- **Sample Rate:** 44.1kHz
- **Window Size:** 343,980 samples (~7.8 seconds)
- **Hop Size:** 171,990 samples (50% overlap)
- **Origin:** Converted from [Meta's Demucs v4]https://github.com/facebookresearch/demucs

The model is automatically downloaded from [HuggingFace](https://huggingface.co/gentij/htdemucs-ort/resolve/main/manifest.json) on first use and cached locally in your system's cache directory with SHA-256 verification.

### Model Registry

The library includes a built-in model registry (`models/registry.json`) that maps model names to their manifest URLs. This allows users to simply specify `"htdemucs_ort_v1"` without needing to remember or provide the full HuggingFace URL.

### Custom Models

You can use custom models by providing a manifest URL override:

```rust
let options = SplitOptions {
    output_dir: "./output".to_string(),
    model_name: "my_custom_model".to_string(),
    manifest_url_override: Some(
        "https://example.com/path/to/manifest.json".to_string()
    ),
};
```

---

## πŸ”§ Advanced Usage

### Error Handling

```rust
use stem_splitter_core::{split_file, SplitOptions};

match split_file("song.mp3", SplitOptions::default()) {
    Ok(result) => {
        println!("Success! Vocals: {}", result.vocals_path);
    }
    Err(e) => {
        eprintln!("Error during separation: {}", e);
    }
}
```

### Working with Model Handles

For advanced use cases, you can manually manage models:

```rust
use stem_splitter_core::{ensure_model, ModelHandle};

fn main() -> anyhow::Result<()> {
    // Get a handle to the model
    let handle: ModelHandle = ensure_model("htdemucs_ort_v1", None)?;
    
    // Access model metadata
    println!("Model path: {}", handle.local_path.display());
    println!("Sample rate: {}", handle.manifest.sample_rate);
    println!("Window size: {}", handle.manifest.window);
    println!("Stems: {:?}", handle.manifest.stems);
    
    Ok(())
}
```

---

## πŸ§ͺ Development

### Running Examples

The library includes two examples demonstrating key features:

#### `split_one` - Complete stem separation with progress tracking

```bash
# Split an audio file into stems
cargo run --release --example split_one -- input.mp3 ./output

# Usage: split_one <audio_file> [output_dir]
# Default output directory is ./out
```

This example demonstrates:
- Download progress callbacks
- Split progress callbacks (stages, chunks, writing)
- Custom model manifest URLs
- Complete stem separation workflow

#### `ensure_model` - Model download and caching

```bash
# Download and cache a model
cargo run --release --example ensure_model
```

This example demonstrates:
- Model download with progress tracking
- Model metadata inspection
- Model registry usage

### Running Tests

```bash
# All tests
cargo test

# Specific test
cargo test model_manager

# With output
cargo test -- --nocapture
```

### Building

```bash
# Debug build
cargo build

# Release build (optimized)
cargo build --release
```

---

## GPU Acceleration & Performance Tuning

GPU acceleration is enabled by default. The library automatically selects the
best execution provider available on the current machine, validates provider
output during early inference, and falls back if a provider is unavailable or
unhealthy.

In internal testing, the current runtime improvements delivered up to roughly
40% faster end-to-end split times, depending on hardware, OS, and provider.

### Default Provider Order

- macOS Apple Silicon: `CoreML -> XNNPACK -> CPU`
- Linux x86_64: `CUDA -> oneDNN -> XNNPACK -> CPU`
- Linux arm64: `CUDA -> XNNPACK -> CPU`
- Windows: `CUDA -> DirectML -> oneDNN -> XNNPACK -> CPU`

Notes:
- `XNNPACK` is a fast CPU-side fallback, not a GPU backend
- `Auto` is the recommended default for most users
- Unhealthy providers are cached per machine/model for 7 days so future runs can skip known-bad paths and start faster

### Common Controls

- `STEMMER_FORCE_CPU=1` β€” force CPU-only mode
- `STEMMER_EP_FORCE=cpu|cuda|coreml|directml|onednn|xnnpack` β€” force a specific provider; fails if unavailable or unhealthy
- `STEMMER_EP_DISABLE=coreml,directml,...` β€” disable one or more providers from auto mode
- `DEBUG_STEMS=1` β€” print provider selection, fallback, and health diagnostics
- `STEMMER_EP_CACHE_BYPASS=1` β€” ignore remembered unhealthy providers for one run
- `STEMMER_EP_CACHE_RESET=1` β€” clear remembered unhealthy providers before selecting
- `STEMMER_PERF=1` β€” print per-window performance timing breakdowns

### Advanced Tuning

ONNX Runtime threading:
- `STEMMER_ORT_INTRA_THREADS=<n>`
- `STEMMER_ORT_INTER_THREADS=<n>`
- `STEMMER_ORT_PARALLEL=0|1`

CoreML tuning on macOS:
- `STEMMER_COREML_UNITS=all|gpu|ane|cpu`
- `STEMMER_COREML_MODEL_FORMAT=mlprogram|neuralnetwork`
- `STEMMER_COREML_SPECIALIZATION=default|fastprediction`
- `STEMMER_COREML_STATIC_INPUTS=0|1`

These advanced options are mainly useful for benchmarking or exposing expert
controls in a GUI. For most users, `Auto` is still the best choice.

### Common Examples

```bash
# Recommended: let the library auto-select the best provider
cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out

# Force CUDA on Linux/Windows NVIDIA systems
STEMMER_EP_FORCE=cuda cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out

# Force CoreML on Apple Silicon
STEMMER_EP_FORCE=coreml cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out

# Force XNNPACK for comparison testing
STEMMER_EP_FORCE=xnnpack cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out

# Force CPU-only mode for maximum stability
STEMMER_FORCE_CPU=1 cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out

# Skip CoreML and let auto mode fall through to the next provider
STEMMER_EP_DISABLE=coreml cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out

# Show provider diagnostics and timing breakdowns
DEBUG_STEMS=1 STEMMER_PERF=1 cargo run --release --bin stem-splitter -- split --input song.mp3 --output ./out
```

### Troubleshooting

- Silent stems or very low output with GPU: disable the failing provider and retry in auto mode, for example `STEMMER_EP_DISABLE=coreml`
- GPU forced for debugging but still bad output: remove `STEMMER_EP_FORCE` and let auto mode fall back
- Need to retest a previously skipped provider: use `STEMMER_EP_CACHE_BYPASS=1`
- Need to clear all remembered unhealthy providers: use `STEMMER_EP_CACHE_RESET=1`
- Need to benchmark a provider on one machine: combine `STEMMER_EP_FORCE=...` with `STEMMER_PERF=1`

---

## πŸ€” FAQ

**Q: Why is the first run slow?**  
A: The model (~200MB) is downloaded on first use. Subsequent runs are instant.

**Q: Where are models stored?**  
A: Models are cached in your system's standard cache directory with SHA-256 verification for integrity.

**Q: Can I use GPU acceleration?**  
A: Yes. GPU acceleration is enabled by default. See `GPU Acceleration & Performance Tuning` for provider order, examples, and advanced controls.

Optional CoreML tuning (advanced, macOS):
- `STEMMER_COREML_UNITS=all|gpu|ane|cpu`
- `STEMMER_COREML_MODEL_FORMAT=mlprogram|neuralnetwork`
- `STEMMER_COREML_SPECIALIZATION=fastprediction|default`
- `STEMMER_COREML_STATIC_INPUTS=0|1`

**Q: GPU acceleration does not work on my machine. Can I skip it?**  
A: Yes. Use `STEMMER_FORCE_CPU=1` to force CPU-only mode, or `STEMMER_EP_DISABLE=...` to skip only the provider that is failing.

**Q: What's the quality compared to Python Demucs?**  
A: Identical quality - we use the same model architecture, just optimized for ONNX.

**Q: Can I use my own custom model?**  
A: Yes! Use the `manifest_url_override` option to point to your own model manifest.

**Q: Does it work offline?**  
A: Yes, after the initial model download, everything works offline.

**Q: What sample rates are supported?**  
A: Input audio is automatically resampled to 44.1kHz for processing.

---

## πŸ—ΊοΈ Roadmap

- [x] GPU acceleration (CUDA, CoreML, DirectML, oneDNN, XNNPACK)
- [ ] Additional model support (6-stem models with guitar/piano)
- [ ] Real-time processing mode
- [ ] Streaming API support

---

## 🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

### Development Setup

1. Clone the repository
2. Install Rust (1.70+): https://rustup.rs
3. Run `cargo build`
4. Run tests: `cargo test`

---

## πŸ“„ License

Licensed under either of:

- MIT License ([LICENSE-MIT]LICENSE-MIT)
- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE)

at your option.

---

## πŸ™ Acknowledgments

- **Meta Research** - Original [Demucs]https://github.com/facebookresearch/demucs model
- **[demucs.onnx]https://github.com/sevagh/demucs.onnx** - ONNX conversion reference
- **ONNX Runtime** - High-performance inference engine
- **Symphonia** - Pure Rust audio decoding

---

## πŸ“ž Support

- πŸ“§ Issues: [GitHub Issues]https://github.com/gentij/stem-splitter-core/issues
- πŸ’¬ Discussions: [GitHub Discussions]https://github.com/gentij/stem-splitter-core/discussions
- πŸ“š Documentation: [docs.rs]https://docs.rs/stem-splitter-core

---

**Made with ❀️ and πŸ¦€ Rust**