The 5MB Alternative to Ollama
Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.
Fast, reliable local AI inference. Shimmy provides OpenAI-compatible endpoints for GGUF models with comprehensive testing and automated quality assurance.
What is Shimmy?
Shimmy is a 5.1MB single-binary local inference server that provides OpenAI API-compatible endpoints for GGUF models. It's designed to be the invisible infrastructure that just works.
| Metric | Shimmy | Ollama |
|---|---|---|
| Binary Size | 5.1MB π | 680MB |
| Startup Time | <100ms π | 5-10s |
| Memory Overhead | <50MB π | 200MB+ |
| OpenAI Compatibility | 100% π | Partial |
| Port Management | Auto π | Manual |
| Configuration | Zero π | Manual |
π― Perfect for Developers
- Privacy: Your code stays on your machine
- Cost: No per-token pricing, unlimited queries
- Speed: Local inference = sub-second responses
- Integration: Works with VSCode, Cursor, Continue.dev out of the box
BONUS: First-class LoRA adapter support - from training to production API in 30 seconds.
Quick Start (30 seconds)
Installation
πͺ Windows
# RECOMMENDED: Use pre-built binary (no build dependencies required)
# OR: Install from source (requires LLVM/Clang)
# First install build dependencies:
# Then install shimmy:
β οΈ Windows Notes:
- Pre-built binary recommended to avoid build dependency issues
- If Windows Defender flags the binary, add an exclusion or use
cargo install- For
cargo install: Install LLVM first to resolvelibclang.dllerrors
π macOS / π§ Linux
# Install from crates.io
Get Models
Shimmy auto-discovers models from:
- Hugging Face cache:
~/.cache/huggingface/hub/ - Ollama models:
~/.ollama/models/ - Local directory:
./models/ - Environment:
SHIMMY_BASE_GGUF=path/to/model.gguf
# Download models that work out of the box
Start Server
# Auto-allocates port to avoid conflicts
# Or use manual port
Point your AI tools to the displayed port - VSCode Copilot, Cursor, Continue.dev all work instantly!
π¦ Download & Install
Package Managers
- Rust:
cargo install shimmy - VS Code: Shimmy Extension
- npm:
npm install -g shimmy-js(coming soon) - Python:
pip install shimmy(coming soon)
Direct Downloads
- GitHub Releases: Latest binaries
- Docker:
docker pull ghcr.io/michael-a-kuykendall/shimmy:latest
π³ Docker Setup
Quick Start:
# Clone the repo for docker-compose
# Start with docker-compose (builds locally)
# Or pull from GitHub Container Registry (when available)
Production Setup:
version: '3.8'
services:
shimmy:
image: ghcr.io/michael-a-kuykendall/shimmy:latest
ports:
- "11434:11434"
volumes:
- ./models:/app/models
environment:
- SHIMMY_BASE_GGUF=/app/models
restart: unless-stopped
π macOS Support
Full compatibility confirmed! Shimmy works flawlessly on macOS with Metal GPU acceleration.
# Install dependencies
# Install shimmy
β Verified working:
- Intel and Apple Silicon Macs
- Metal GPU acceleration (automatic)
- Xcode 17+ compatibility
- All LoRA adapter features
Integration Examples
VSCode Copilot
Continue.dev
Cursor IDE
Works out of the box - just point to http://localhost:11435/v1
Why Shimmy Will Always Be Free
I built Shimmy because I was tired of 680MB binaries to run a 4GB model.
This is my commitment: Shimmy stays MIT licensed, forever. If you want to support development, sponsor it. If you don't, just build something cool with it.
Shimmy saves you time and money. If it's useful, consider sponsoring for $5/month β less than your Netflix subscription, infinitely more useful.
Performance Comparison
| Tool | Binary Size | Startup Time | Memory Usage | OpenAI API |
|---|---|---|---|---|
| Shimmy | 5.1MB | <100ms | 50MB | 100% |
| Ollama | 680MB | 5-10s | 200MB+ | Partial |
| llama.cpp | 89MB | 1-2s | 100MB | None |
API Reference
Endpoints
GET /health- Health checkPOST /v1/chat/completions- OpenAI-compatible chatGET /v1/models- List available modelsPOST /api/generate- Shimmy native APIGET /ws/generate- WebSocket streaming
CLI Commands
Technical Architecture
- Rust + Tokio: Memory-safe, async performance
- llama.cpp backend: Industry-standard GGUF inference
- OpenAI API compatibility: Drop-in replacement
- Dynamic port management: Zero conflicts, auto-allocation
- Zero-config auto-discovery: Just worksβ’
Community & Support
- π Bug Reports: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: docs/
- π Sponsorship: GitHub Sponsors
Sponsors
See our amazing sponsors who make Shimmy possible! π
Sponsorship Tiers:
- $5/month: Coffee tier - My eternal gratitude + sponsor badge
- $25/month: Bug prioritizer - Priority support + name in SPONSORS.md
- $100/month: Corporate backer - Logo on README + monthly office hours
- $500/month: Infrastructure partner - Direct support + roadmap input
Companies: Need invoicing? Email michaelallenkuykendall@gmail.com
Quality & Reliability
Shimmy maintains high code quality through comprehensive testing:
- Comprehensive test suite with property-based testing
- Automated CI/CD pipeline with quality gates
- Runtime invariant checking for critical operations
- Cross-platform compatibility testing
See our testing approach for technical details.
License & Philosophy
MIT License - forever and always.
Philosophy: Infrastructure should be invisible. Shimmy is infrastructure.
Testing Philosophy: Reliability through comprehensive validation and property-based testing.
Forever maintainer: Michael A. Kuykendall
Promise: This will never become a paid product
Mission: Making local AI development frictionless
"The best code is code you don't have to think about."
"The best tests are properties you can't break."