spatial-maker (Rust)
Convert 2D images and videos to stereoscopic 3D spatial content for Apple Vision Pro using AI depth estimation.
This is a Rust library that uses Depth Anything V2 via ONNX Runtime to estimate depth from 2D images and create side-by-side (SBS) stereo pairs for 3D viewing.
Features
- ✅ Fast depth estimation using ONNX Runtime with CoreML acceleration on Apple Silicon
- ✅ High quality stereo generation using Depth-Image Based Rendering (DIBR)
- ✅ Photo support — convert single images to SBS stereo
- 🚧 Video support — coming soon (per-frame processing with optional temporal smoothing)
- 🚧 Video Depth Anything integration — temporally consistent depth for flicker-free video
Quick Start
Download the Model
Download the Depth Anything V2 Small ONNX model:
Usage (Photo Example)
Or use as a library:
use process_photo;
use open;
How It Works
- Depth Estimation: Uses Depth Anything V2 (24.8M params) via ONNX Runtime to estimate per-pixel depth
- Preprocessing: Resizes image to 518x518, normalizes with ImageNet mean/std
- Stereo Generation: Uses DIBR to create left/right eye views by shifting pixels based on depth
- Output: Horizontally stacks left and right views into a side-by-side image
Model Info
- Model: Depth Anything V2 Small
- ONNX Export: fabio-sim/Depth-Anything-ONNX
- License: Apache-2.0
- Size: 95 MB
- Speed: ~13ms on RTX 4080, faster on Apple Silicon with CoreML
Roadmap
- Photo support (SBS output)
- ONNX model integration
- Model checkpoint discovery
- CoreML execution provider testing
- Video frame processing
- Video Depth Anything integration (temporal consistency)
- Integration with Frame app (replace Python subprocess)
Development
Built with:
See research/rust-depth.md for detailed design decisions and model comparisons.
License
MIT