spatial-maker
Convert 2D images and videos to stereoscopic 3D spatial content for Apple Vision Pro using AI depth estimation.
Features
- Fast depth estimation — CoreML on Apple Silicon (128ms/frame on M4 Pro) with ONNX fallback
- High-quality stereo generation — Depth-Image Based Rendering (DIBR) with hole filling
- Photo & video support — Process single images or full videos with progress callbacks
- Multi-format input — JPEG, PNG, AVIF, JPEG XL, HEIC via native decoders or ffmpeg
- MV-HEVC output — Side-by-side, top-and-bottom, or separate stereo pairs with optional MV-HEVC packaging
🚧 massively under construction 🧱
Quick Start
Install the CLI
for the latest code (untested)
Convert a Video
INPUT_VIDEO=/Movies/my-video.mp4
MODEL=b # s (small), b (base), or l (large)
Convert a Photo
INPUT_PHOTO=/Pictures/photo.jpg
MODEL=b
MAX_DISPARITY=30 # Higher = more 3D depth
Use as a Library
Add to your Cargo.toml:
[]
= "0.1"
Process a photo:
use ;
use Path;
async
Process a video:
use ;
use Path;
async
Model Sizes
Models are auto-downloaded from HuggingFace on first use:
| Size | Download | Quality | Speed (M4 Pro) | License |
|---|---|---|---|---|
| Small (default) | 48 MB | Good | ~85ms/frame | Apache-2.0 ✅ |
| Base | 186 MB | Better | ~128ms/frame | CC-BY-NC-4.0 ⚠️ |
| Large | 638 MB | Best | ~190ms/frame | CC-BY-NC-4.0 ⚠️ |
⚠️ Base and Large models are non-commercial only (CC-BY-NC-4.0). Small is Apache-2.0 (commercial OK).
Select a model (CLI):
MODEL=b # s (small, 48MB), b (base, 186MB), l (large, 638MB)
INPUT=/Movies/video.mp4
OUTPUT=/Desktop/spatial.mp4
Select a model (library):
let model_size = "b".to_string; // "s", "b", or "l"
let max_disparity = 30;
let config = SpatialConfig ;
Feature Flags
[]
= { = "0.1", = ["onnx"] }
| Feature | Default | Description |
|---|---|---|
coreml |
✅ | CoreML via Swift FFI (macOS only, ~3x faster than ONNX CPU) |
onnx |
❌ | ONNX Runtime fallback (cross-platform) |
avif |
❌ | Native AVIF decoder (requires system libdav1d) |
jxl |
❌ | Native JPEG XL decoder (pure Rust via jxl-oxide) |
heic |
❌ | Native HEIC decoder (requires system libheif) |
Formats not enabled via feature flags fall back to ffmpeg conversion (slower but works for everything).
How It Works
- Depth Estimation: Depth Anything V2 via CoreML (default) or ONNX
- Preprocessing: Resize to 518×518, normalize with ImageNet stats, NCHW tensor format
- Stereo Generation: DIBR shifts pixels based on depth; expanding-ring hole filling for disocclusions
- Video Pipeline:
ffmpegframe extraction → depth+stereo (parallelized) →ffmpegencoding with model caching
API Overview
// High-level API (most common)
pub async ; // macOS CoreML backend
; // ONNX Runtime backend
See docs.rs/spatial-maker for full API documentation.
Requirements
For CoreML (default, macOS only):
- macOS 13.0+ (Ventura or later)
- Apple Silicon recommended (falls back to CPU on Intel)
- Xcode Command Line Tools (for Swift compilation during build)
For ONNX (optional, cross-platform):
- Any OS (Linux, Windows, macOS)
- Binaries auto-downloaded via
ortcrate
For Video:
ffmpegin PATH — https://ffmpeg.orgspatialCLI for MV-HEVC output — https://blog.mikeswanson.com/spatial
License
MIT
Models: Small (Apache-2.0), Base/Large (CC-BY-NC-4.0). See HuggingFace repo for details.