π΅ Stem Splitter Core
High-performance, pure-Rust audio stem separation library powered by ONNX Runtime
π§ Overview
stem-splitter-core is a Rust library for splitting audio tracks into isolated stems (vocals, drums, bass, and other instruments) using state-of-the-art AI models. Built entirely in Rust with ONNX Runtime, it provides:
- No Python dependency - Pure Rust implementation
- High-quality separation - Uses the Hybrid Transformer Demucs (htdemucs) model
- Automatic model management - Downloads and caches models with registry support
- Fast inference - Optimized ONNX Runtime with multi-threading support
- Progress tracking - Built-in callbacks for download and processing progress
- Production-ready - Memory-safe, performant, and battle-tested
Perfect for music production tools, DJ software, karaoke apps, or any application requiring audio source separation.
β¨ Features
- π΅ 4-Stem Separation β Isolate vocals, drums, bass, and other instruments
- π§ State-of-the-art AI β Hybrid Transformer Demucs model (htdemucs)
- π¦ Model Registry β Built-in model registry with support for multiple models
- ποΈ Multiple Formats β Supports WAV, MP3, FLAC, OGG, and more via Symphonia
- π Progress Tracking β Real-time callbacks for download and split progress
- π Type-safe β Strong compile-time guarantees with Rust's type system
- πΎ Smart Caching β Models cached in user directories with SHA-256 verification
π¦ Installation
Add to your Cargo.toml:
[]
= "0.1"
System Requirements
- Rust 1.70+
- ~200MB disk space for model storage (first run only)
- 4GB+ RAM recommended for processing
No external dependencies or Python installation required!
π Quick Start
Basic Usage
use ;
Or even simpler with defaults:
use ;
With Progress Tracking
use ;
Pre-loading Models
For applications that need to minimize latency, pre-load the model:
use prepare_model;
π API Reference
split_file(input_path: &str, opts: SplitOptions) -> Result<SplitResult>
Main function to split an audio file into stems.
Parameters:
input_path: Path to the audio file (supports WAV, MP3, FLAC, OGG, etc.)opts: Configuration options (seeSplitOptions)
Returns:
SplitResultcontaining paths to the separated stem files
SplitOptions
Configuration struct for the separation process.
Default values:
output_dir:"."model_name:"htdemucs_ort_v1"manifest_url_override:None
SplitResult
Result struct containing paths to the separated stems.
prepare_model(model_name: &str, manifest_url_override: Option<&str>) -> Result<()>
Pre-loads and caches a model for faster subsequent splits.
Parameters:
model_name: Name of the model to preparemanifest_url_override: Optional URL to override the manifest location
ensure_model(model_name: &str, manifest_url_override: Option<&str>) -> Result<ModelHandle>
Downloads and verifies a model, returning a handle with metadata.
Parameters:
model_name: Name of the model to ensuremanifest_url_override: Optional URL to override the manifest location
Returns:
ModelHandlecontaining the manifest and local path to the model
set_download_progress_callback(callback: F)
Set a callback to track model download progress.
Callback parameters:
downloaded: Bytes downloaded so fartotal: Total bytes to download (0 if unknown)
set_split_progress_callback(callback: F)
Set a callback to track split processing progress.
SplitProgress variants:
Stage(&'static str): Current processing stage (e.g., "resolve_model", "read_audio", "infer")Chunks { done, total, percent }: Progress through audio chunksWriting { stem, done, total, percent }: Progress writing a specific stemFinished: Processing complete
π― Supported Audio Formats
The library supports a wide range of audio formats through the Symphonia decoder:
- WAV - Uncompressed audio (best quality)
- MP3 - MPEG Layer 3
- FLAC - Free Lossless Audio Codec
- OGG Vorbis - Open-source lossy format
- AAC - Advanced Audio Coding
- And more...
Output Format: All stems are saved as 16-bit PCM WAV files at 44.1kHz stereo.
π§ Model Information
HTDemucs-ORT (htdemucs_ort_v1)
This is the default and currently supported model:
- Architecture: Hybrid Transformer Demucs
- Format: ONNX Runtime optimized
- Size: ~200MB (~209MB to be precise)
- Quality: State-of-the-art separation quality
- Sources: 4 stems (drums, bass, other, vocals)
- Sample Rate: 44.1kHz
- Window Size: 343,980 samples (~7.8 seconds)
- Hop Size: 171,990 samples (50% overlap)
- Origin: Converted from Meta's Demucs v4
The model is automatically downloaded from HuggingFace on first use and cached locally in your system's cache directory with SHA-256 verification.
Model Registry
The library includes a built-in model registry (models/registry.json) that maps model names to their manifest URLs. This allows users to simply specify "htdemucs_ort_v1" without needing to remember or provide the full HuggingFace URL.
Custom Models
You can use custom models by providing a manifest URL override:
let options = SplitOptions ;
π§ Advanced Usage
Error Handling
use ;
match split_file
Working with Model Handles
For advanced use cases, you can manually manage models:
use ;
π§ͺ Development
Running Examples
The library includes two examples demonstrating key features:
split_one - Complete stem separation with progress tracking
# Split an audio file into stems
# Usage: split_one <audio_file> [output_dir]
# Default output directory is ./out
This example demonstrates:
- Download progress callbacks
- Split progress callbacks (stages, chunks, writing)
- Custom model manifest URLs
- Complete stem separation workflow
ensure_model - Model download and caching
# Download and cache a model
This example demonstrates:
- Model download with progress tracking
- Model metadata inspection
- Model registry usage
Running Tests
# All tests
# Specific test
# With output
Building
# Debug build
# Release build (optimized)
π€ FAQ
Q: Why is the first run slow?
A: The model (~200MB) is downloaded on first use. Subsequent runs are instant.
Q: Where are models stored?
A: Models are cached in your system's standard cache directory with SHA-256 verification for integrity.
Q: Can I use GPU acceleration?
A: Currently CPU-only. GPU support via ONNX Runtime execution providers is planned.
Q: What's the quality compared to Python Demucs?
A: Identical quality - we use the same model architecture, just optimized for ONNX.
Q: Can I use my own custom model?
A: Yes! Use the manifest_url_override option to point to your own model manifest.
Q: Does it work offline?
A: Yes, after the initial model download, everything works offline.
Q: What sample rates are supported?
A: Input audio is automatically resampled to 44.1kHz for processing.
πΊοΈ Roadmap
- GPU acceleration (CUDA, Metal, DirectML)
- Additional model support (6-stem models with guitar/piano)
- Real-time processing mode
- Streaming API support
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Setup
- Clone the repository
- Install Rust (1.70+): https://rustup.rs
- Run
cargo build - Run tests:
cargo test
π License
Licensed under either of:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
at your option.
π Acknowledgments
- Meta Research - Original Demucs model
- demucs.onnx - ONNX conversion reference
- ONNX Runtime - High-performance inference engine
- Symphonia - Pure Rust audio decoding
π Support
- π§ Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: docs.rs
Made with β€οΈ and π¦ Rust