π΅ Stem Splitter Core
High-performance, pure-Rust audio stem separation library powered by ONNX Runtime
π§ Overview
stem-splitter-core is a Rust library for splitting audio tracks into isolated stems (vocals, drums, bass, and other instruments) using state-of-the-art AI models. Built entirely in Rust with ONNX Runtime, it provides:
- No Python dependency - Pure Rust implementation
- High-quality separation - Uses the Hybrid Transformer Demucs (htdemucs) model
- Automatic model management - Downloads and caches models with registry support
- Fast inference - Optimized ONNX Runtime with GPU acceleration and multi-threading
- Progress tracking - Built-in callbacks for download and processing progress
- Production-ready - Memory-safe, performant, and battle-tested
Perfect for music production tools, DJ software, karaoke apps, or any application requiring audio source separation.
β¨ Features
- π΅ 4-Stem Separation β Isolate vocals, drums, bass, and other instruments
- π§ State-of-the-art AI β Hybrid Transformer Demucs model (htdemucs)
- π GPU Acceleration β CUDA, CoreML, DirectML, oneDNN, and XNNPACK support (auto-detected)
- π¦ Model Registry β Built-in model registry with support for multiple models
- ποΈ Multiple Formats β Supports WAV, MP3, FLAC, OGG, and more via Symphonia
- π Progress Tracking β Real-time callbacks for download and split progress
- π Type-safe β Strong compile-time guarantees with Rust's type system
- πΎ Smart Caching β Models cached in user directories with SHA-256 verification
π§ CLI & Distribution
While stem-splitter-core is primarily a Rust library, this repository also provides a
first-party CLI (stem-splitter) and prebuilt binaries for common platforms.
CLI
The CLI is built on top of stem-splitter-core and exposes the same high-performance
audio stem separation features via the command line.
The CLI source lives in:
src/bin/stem-splitter.rs
Prebuilt Binaries
Prebuilt binaries are published with each GitHub release:
https://github.com/gentij/stem-splitter-core/releases
These binaries are suitable for:
- Arch Linux (via AUR)
- Debian / Ubuntu (manual install)
- Any glibc-based Linux distribution
Platform Packages
- macOS: Homebrew
- Arch Linux: AUR (
stem-splitter-bin) - Linux (generic): tar.gz binary from GitHub Releases
See the packaging/ directory for reference packaging files.
π¦ Installation
Add to your Cargo.toml:
[]
= "1.0.0"
System Requirements
- Rust 1.70+
- ~200MB disk space for model storage (first run only)
- 4GB+ RAM recommended for processing
No external dependencies or Python installation required!
π Quick Start
Basic Usage
use ;
Or even simpler with defaults:
use ;
With Progress Tracking
use ;
Pre-loading Models
For applications that need to minimize latency, pre-load the model:
use prepare_model;
π API Reference
split_file(input_path: &str, opts: SplitOptions) -> Result<SplitResult>
Main function to split an audio file into stems.
Parameters:
input_path: Path to the audio file (supports WAV, MP3, FLAC, OGG, etc.)opts: Configuration options (seeSplitOptions)
Returns:
SplitResultcontaining paths to the separated stem files
SplitOptions
Configuration struct for the separation process.
Default values:
output_dir:"."model_name:"htdemucs_ort_v1"manifest_url_override:None
SplitResult
Result struct containing paths to the separated stems.
prepare_model(model_name: &str, manifest_url_override: Option<&str>) -> Result<()>
Pre-loads and caches a model for faster subsequent splits.
Parameters:
model_name: Name of the model to preparemanifest_url_override: Optional URL to override the manifest location
ensure_model(model_name: &str, manifest_url_override: Option<&str>) -> Result<ModelHandle>
Downloads and verifies a model, returning a handle with metadata.
Parameters:
model_name: Name of the model to ensuremanifest_url_override: Optional URL to override the manifest location
Returns:
ModelHandlecontaining the manifest and local path to the model
set_download_progress_callback(callback: F)
Set a callback to track model download progress.
Callback parameters:
downloaded: Bytes downloaded so fartotal: Total bytes to download (0 if unknown)
set_split_progress_callback(callback: F)
Set a callback to track split processing progress.
SplitProgress variants:
Stage(&'static str): Current processing stage (e.g., "resolve_model", "read_audio", "infer")Chunks { done, total, percent }: Progress through audio chunksWriting { stem, done, total, percent }: Progress writing a specific stemFinished: Processing complete
π― Supported Audio Formats
The library supports a wide range of audio formats through the Symphonia decoder:
- WAV - Uncompressed audio (best quality)
- MP3 - MPEG Layer 3
- FLAC - Free Lossless Audio Codec
- OGG Vorbis - Open-source lossy format
- AAC - Advanced Audio Coding
- And more...
Output Format: All stems are saved as 16-bit PCM WAV files at 44.1kHz stereo.
π§ Model Information
HTDemucs-ORT (htdemucs_ort_v1)
This is the default and currently supported model:
- Architecture: Hybrid Transformer Demucs
- Format: ONNX Runtime optimized
- Size: ~200MB (~209MB to be precise)
- Quality: State-of-the-art separation quality
- Sources: 4 stems (drums, bass, other, vocals)
- Sample Rate: 44.1kHz
- Window Size: 343,980 samples (~7.8 seconds)
- Hop Size: 171,990 samples (50% overlap)
- Origin: Converted from Meta's Demucs v4
The model is automatically downloaded from HuggingFace on first use and cached locally in your system's cache directory with SHA-256 verification.
Model Registry
The library includes a built-in model registry (models/registry.json) that maps model names to their manifest URLs. This allows users to simply specify "htdemucs_ort_v1" without needing to remember or provide the full HuggingFace URL.
Custom Models
You can use custom models by providing a manifest URL override:
let options = SplitOptions ;
π§ Advanced Usage
Error Handling
use ;
match split_file
Working with Model Handles
For advanced use cases, you can manually manage models:
use ;
π§ͺ Development
Running Examples
The library includes two examples demonstrating key features:
split_one - Complete stem separation with progress tracking
# Split an audio file into stems
# Usage: split_one <audio_file> [output_dir]
# Default output directory is ./out
This example demonstrates:
- Download progress callbacks
- Split progress callbacks (stages, chunks, writing)
- Custom model manifest URLs
- Complete stem separation workflow
ensure_model - Model download and caching
# Download and cache a model
This example demonstrates:
- Model download with progress tracking
- Model metadata inspection
- Model registry usage
Running Tests
# All tests
# Specific test
# With output
Building
# Debug build
# Release build (optimized)
GPU Acceleration & Performance Tuning
GPU acceleration is enabled by default. The library automatically selects the best execution provider available on the current machine, validates provider output during early inference, and falls back if a provider is unavailable or unhealthy.
In internal testing, the current runtime improvements delivered up to roughly 40% faster end-to-end split times, depending on hardware, OS, and provider.
Default Provider Order
- macOS Apple Silicon:
CoreML -> XNNPACK -> CPU - Linux x86_64:
CUDA -> oneDNN -> XNNPACK -> CPU - Linux arm64:
CUDA -> XNNPACK -> CPU - Windows:
CUDA -> DirectML -> oneDNN -> XNNPACK -> CPU
Notes:
XNNPACKis a fast CPU-side fallback, not a GPU backendAutois the recommended default for most users- Unhealthy providers are cached per machine/model for 7 days so future runs can skip known-bad paths and start faster
Common Controls
STEMMER_FORCE_CPU=1β force CPU-only modeSTEMMER_EP_FORCE=cpu|cuda|coreml|directml|onednn|xnnpackβ force a specific provider; fails if unavailable or unhealthySTEMMER_EP_DISABLE=coreml,directml,...β disable one or more providers from auto modeDEBUG_STEMS=1β print provider selection, fallback, and health diagnosticsSTEMMER_EP_CACHE_BYPASS=1β ignore remembered unhealthy providers for one runSTEMMER_EP_CACHE_RESET=1β clear remembered unhealthy providers before selectingSTEMMER_PERF=1β print per-window performance timing breakdowns
Advanced Tuning
ONNX Runtime threading:
STEMMER_ORT_INTRA_THREADS=<n>STEMMER_ORT_INTER_THREADS=<n>STEMMER_ORT_PARALLEL=0|1
CoreML tuning on macOS:
STEMMER_COREML_UNITS=all|gpu|ane|cpuSTEMMER_COREML_MODEL_FORMAT=mlprogram|neuralnetworkSTEMMER_COREML_SPECIALIZATION=default|fastpredictionSTEMMER_COREML_STATIC_INPUTS=0|1
These advanced options are mainly useful for benchmarking or exposing expert
controls in a GUI. For most users, Auto is still the best choice.
Common Examples
# Recommended: let the library auto-select the best provider
# Force CUDA on Linux/Windows NVIDIA systems
STEMMER_EP_FORCE=cuda
# Force CoreML on Apple Silicon
STEMMER_EP_FORCE=coreml
# Force XNNPACK for comparison testing
STEMMER_EP_FORCE=xnnpack
# Force CPU-only mode for maximum stability
STEMMER_FORCE_CPU=1
# Skip CoreML and let auto mode fall through to the next provider
STEMMER_EP_DISABLE=coreml
# Show provider diagnostics and timing breakdowns
DEBUG_STEMS=1 STEMMER_PERF=1
Troubleshooting
- Silent stems or very low output with GPU: disable the failing provider and retry in auto mode, for example
STEMMER_EP_DISABLE=coreml - GPU forced for debugging but still bad output: remove
STEMMER_EP_FORCEand let auto mode fall back - Need to retest a previously skipped provider: use
STEMMER_EP_CACHE_BYPASS=1 - Need to clear all remembered unhealthy providers: use
STEMMER_EP_CACHE_RESET=1 - Need to benchmark a provider on one machine: combine
STEMMER_EP_FORCE=...withSTEMMER_PERF=1
π€ FAQ
Q: Why is the first run slow?
A: The model (~200MB) is downloaded on first use. Subsequent runs are instant.
Q: Where are models stored?
A: Models are cached in your system's standard cache directory with SHA-256 verification for integrity.
Q: Can I use GPU acceleration?
A: Yes. GPU acceleration is enabled by default. See GPU Acceleration & Performance Tuning for provider order, examples, and advanced controls.
Optional CoreML tuning (advanced, macOS):
STEMMER_COREML_UNITS=all|gpu|ane|cpuSTEMMER_COREML_MODEL_FORMAT=mlprogram|neuralnetworkSTEMMER_COREML_SPECIALIZATION=fastprediction|defaultSTEMMER_COREML_STATIC_INPUTS=0|1
Q: GPU acceleration does not work on my machine. Can I skip it?
A: Yes. Use STEMMER_FORCE_CPU=1 to force CPU-only mode, or STEMMER_EP_DISABLE=... to skip only the provider that is failing.
Q: What's the quality compared to Python Demucs?
A: Identical quality - we use the same model architecture, just optimized for ONNX.
Q: Can I use my own custom model?
A: Yes! Use the manifest_url_override option to point to your own model manifest.
Q: Does it work offline?
A: Yes, after the initial model download, everything works offline.
Q: What sample rates are supported?
A: Input audio is automatically resampled to 44.1kHz for processing.
πΊοΈ Roadmap
- GPU acceleration (CUDA, CoreML, DirectML, oneDNN, XNNPACK)
- Additional model support (6-stem models with guitar/piano)
- Real-time processing mode
- Streaming API support
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
- Clone the repository
- Install Rust (1.70+): https://rustup.rs
- Run
cargo build - Run tests:
cargo test
π License
Licensed under either of:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.
π Acknowledgments
- Meta Research - Original Demucs model
- demucs.onnx - ONNX conversion reference
- ONNX Runtime - High-performance inference engine
- Symphonia - Pure Rust audio decoding
π Support
- π§ Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: docs.rs
Made with β€οΈ and π¦ Rust