shimmy 2.0.1

Lightweight Ollama-compatible inference server with native SafeTensors support. No Python dependencies, cross-platform WebGPU acceleration via Airframe.
Documentation
1
2
3
4
5
// Response caching for identical inference requests

pub mod response_cache;

pub use response_cache::ResponseCache;