shimmy 2.0.1

Lightweight Ollama-compatible inference server with native SafeTensors support. No Python dependencies, cross-platform WebGPU acceleration via Airframe.

Documentation

// Response caching for identical inference requests

pub mod response_cache;

pub use response_cache::ResponseCache;