The Privacy-First Alternative to Ollama
π Local AI Without the Lock-in π
Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.
π Support Shimmy's Growth
π If Shimmy helps you, consider sponsoring β 100% of support goes to keeping it free forever.
- $5/month: Coffee tier β - Eternal gratitude + sponsor badge
- $25/month: Bug prioritizer π - Priority support + name in SPONSORS.md
- $100/month: Corporate backer π’ - Logo placement + monthly office hours
- $500/month: Infrastructure partner π - Direct support + roadmap input
π― Become a Sponsor | See our amazing sponsors π
Drop-in OpenAI API Replacement for Local LLMs
Shimmy is a 5.1MB single-binary that provides 100% OpenAI-compatible endpoints for GGUF models. Point your existing AI tools to Shimmy and they just work β locally, privately, and free.
π€ What are you building with Shimmy?
New developer tools and specifications included! Whether you're forking Shimmy for your application or integrating it as a service, we now provide:
- π§ Integration Templates: Copy-paste guidance for embedding Shimmy in your projects
- π Development Specifications: GitHub Spec-Kit methodology for planning Shimmy-based features
- π‘οΈ Architectural Guarantees: Constitutional principles ensuring Shimmy stays reliable and lightweight
- π Complete Documentation: Everything you need to build on Shimmy's foundation
Building something cool with Shimmy? These tools help you do it systematically and reliably.
π GitHub Spec-Kit Integration
Shimmy now includes GitHub's brand-new Spec-Kit methodology β specification-driven development that just launched in September 2025! Get professional-grade development workflows:
- ποΈ Systematic Development:
/specifyβ/planβ/tasksβ implement - π€ AI-Native Workflow: Works with Claude Code, GitHub Copilot, and other AI assistants
- π Professional Templates: Complete specification and planning frameworks
- π‘οΈ Constitutional Protection: Built-in governance and architectural validation
π Complete Developer Guide β β’ π οΈ Learn GitHub Spec-Kit β
Try it in 30 seconds
# 1) Install + run
&
# 2) See models and pick one
# 3) Smoke test the OpenAI API
|
π Works with Your Existing Tools
No code changes needed - just change the API endpoint:
- VSCode Extensions: Point to
http://localhost:11435 - Cursor Editor: Built-in OpenAI compatibility
- Continue.dev: Drop-in model provider
- Any OpenAI client: Python, Node.js, curl, etc.
Use with OpenAI SDKs
- Node.js (openai v4)
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://127.0.0.1:11435/v1",
apiKey: "sk-local", // placeholder, Shimmy ignores it
});
const resp = await openai.chat.completions.create({
model: "REPLACE_WITH_MODEL",
messages: [{ role: "user", content: "Say hi in 5 words." }],
max_tokens: 32,
});
console.log(resp.choices[0].message?.content);
- Python (openai>=1.0.0)
=
=
β‘ Zero Configuration Required
- Auto-discovers models from Hugging Face cache, Ollama, local dirs
- Auto-allocates ports to avoid conflicts
- Auto-detects LoRA adapters for specialized models
- Just works - no config files, no setup wizards
π― Perfect for Local Development
- Privacy: Your code never leaves your machine
- Cost: No API keys, no per-token billing
- Speed: Local inference, sub-second responses
- Reliability: No rate limits, no downtime
Quick Start (30 seconds)
Installation
πͺ Windows
# RECOMMENDED: Use pre-built binary (no build dependencies required)
# OR: Install from source (requires LLVM/Clang)
# First install build dependencies:
# Then install shimmy:
β οΈ Windows Notes:
- Pre-built binary recommended to avoid build dependency issues
- If Windows Defender flags the binary, add an exclusion or use
cargo install- For
cargo install: Install LLVM first to resolvelibclang.dllerrors
π macOS / π§ Linux
# Install from crates.io
Get Models
Shimmy auto-discovers models from:
- Hugging Face cache:
~/.cache/huggingface/hub/ - Ollama models:
~/.ollama/models/ - Local directory:
./models/ - Environment:
SHIMMY_BASE_GGUF=path/to/model.gguf
# Download models that work out of the box
Start Server
# Auto-allocates port to avoid conflicts
# Or use manual port
Point your AI tools to the displayed port β VSCode Copilot, Cursor, Continue.dev all work instantly.
π¦ Download & Install
Package Managers
- Rust:
cargo install shimmy - VS Code: Shimmy Extension
- npm:
npm install -g shimmy-js(coming soon) - Python:
pip install shimmy(coming soon)
Direct Downloads
- GitHub Releases: Latest binaries
- Docker:
docker pull shimmy/shimmy:latest(coming soon)
π macOS Support
Full compatibility confirmed! Shimmy works flawlessly on macOS with Metal GPU acceleration.
# Install dependencies
# Install shimmy
β Verified working:
- Intel and Apple Silicon Macs
- Metal GPU acceleration (automatic)
- Xcode 17+ compatibility
- All LoRA adapter features
Integration Examples
VSCode Copilot
Continue.dev
Cursor IDE
Works out of the box - just point to http://localhost:11435/v1
Why Shimmy Will Always Be Free
I built Shimmy to retain privacy-first control on my AI development and keep things local and lean.
This is my commitment: Shimmy stays MIT licensed, forever. If you want to support development, sponsor it. If you don't, just build something cool with it.
π‘ Shimmy saves you time and money. If it's useful, consider sponsoring for $5/month β less than your Netflix subscription, infinitely more useful for developers.
API Reference
Endpoints
GET /health- Health checkPOST /v1/chat/completions- OpenAI-compatible chatGET /v1/models- List available modelsPOST /api/generate- Shimmy native APIGET /ws/generate- WebSocket streaming
CLI Commands
Technical Architecture
- Rust + Tokio: Memory-safe, async performance
- llama.cpp backend: Industry-standard GGUF inference
- OpenAI API compatibility: Drop-in replacement
- Dynamic port management: Zero conflicts, auto-allocation
- Zero-config auto-discovery: Just worksβ’
Community & Support
- π Bug Reports: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: docs/ β’ Engineering Methodology β’ OpenAI Compatibility Matrix β’ Benchmarks (Reproducible)
- π Sponsorship: GitHub Sponsors
Star History
π Momentum Snapshot
π¦ 5 MB single binary
π stars and climbing fast
β± <1s startup
π¦ 100% Rust, no Python
π° As Featured On
π₯ Hacker News β’ Front Page Again β’ IPE Newsletter
Companies: Need invoicing? Email michaelallenkuykendall@gmail.com
β‘ Performance Comparison
| Tool | Binary Size | Startup Time | Memory Usage | OpenAI API |
|---|---|---|---|---|
| Shimmy | 5.1MB | <100ms | 50MB | 100% |
| Ollama | 680MB | 5-10s | 200MB+ | Partial |
| llama.cpp | 89MB | 1-2s | 100MB | None |
Quality & Reliability
Shimmy maintains high code quality through comprehensive testing:
- Comprehensive test suite with property-based testing
- Automated CI/CD pipeline with quality gates
- Runtime invariant checking for critical operations
- Cross-platform compatibility testing
See our testing approach for technical details.
License & Philosophy
MIT License - forever and always.
Philosophy: Infrastructure should be invisible. Shimmy is infrastructure.
Testing Philosophy: Reliability through comprehensive validation and property-based testing.
Forever maintainer: Michael A. Kuykendall
Promise: This will never become a paid product
Mission: Making local AI development frictionless