# Shimmy - The 5MB Alternative to Ollama
[](https://crates.io/crates/shimmy)
[](https://crates.io/crates/shimmy)
[](https://opensource.org/licenses/MIT)
[](https://rustup.rs/)
[](https://github.com/Michael-A-Kuykendall/shimmy/actions)
[](https://github.com/sponsors/Michael-A-Kuykendall)
**Shimmy will be free forever.** No asterisks. No "free for now." No pivot to paid.
**Fast, reliable local AI inference.** Shimmy provides OpenAI-compatible endpoints for GGUF models with comprehensive testing and automated quality assurance.
## What is Shimmy?
Shimmy is a **5.1MB single-binary** local inference server that provides OpenAI API-compatible endpoints for GGUF models. It's designed to be the **invisible infrastructure** that just works.
| **Binary Size** | 5.1MB 🏆 | 680MB |
| **Startup Time** | <100ms 🏆 | 5-10s |
| **Memory Overhead** | <50MB 🏆 | 200MB+ |
| **OpenAI Compatibility** | 100% 🏆 | Partial |
| **Port Management** | Auto 🏆 | Manual |
| **Configuration** | Zero 🏆 | Manual |
## Why Choose Shimmy?
- **Zero Configuration**: Auto-discovers models and assigns ports
- **Native SafeTensors**: No Python dependencies, 2x faster loading
- **OpenAI Compatible**: Drop-in replacement for OpenAI API calls
- **Cross-Platform**: Windows, macOS, Linux (including ARM64)
- **Integration**: Works with VSCode, Cursor, Continue.dev out of the box
**BONUS:** First-class LoRA adapter support - from training to production API in 30 seconds.
## Quick Start (30 seconds)
```bash
# Install via cargo
cargo install shimmy
# Auto-discover models and start server
shimmy serve
# 🚀 Server running at http://localhost:11435
# ✅ Found 3 models: llama-3.2-1b, phi-3-mini, mistral-7b
# 📡 OpenAI API compatible endpoints ready
```
Point your AI tools to the displayed port - VSCode Copilot, Cursor, Continue.dev all work instantly!
## 📦 Installation
### Package Managers
- **Rust**: `cargo install shimmy`
- **VS Code**: [Shimmy Extension](https://marketplace.visualstudio.com/items?itemName=targetedwebresults.shimmy-vscode)
### Direct Downloads
- **GitHub Releases**: [Latest binaries](https://github.com/Michael-A-Kuykendall/shimmy/releases/latest)
- **Docker**: `docker pull ghcr.io/michael-a-kuykendall/shimmy:latest`
### 🐳 Docker Setup
**Quick Start:**
```bash
# Clone repo and run (builds locally)
git clone https://github.com/Michael-A-Kuykendall/shimmy.git
cd shimmy
# Start with docker-compose (builds locally)
docker-compose up
# Or pull from GitHub Container Registry
docker run -p 11434:11434 -v ./models:/app/models ghcr.io/michael-a-kuykendall/shimmy:latest
```
### 🍎 macOS Support
**Full compatibility confirmed!** Shimmy works flawlessly on macOS with Metal GPU acceleration.
```bash
# Install dependencies
brew install cmake rust
# Install shimmy
cargo install shimmy
```
**✅ Verified working:**
- Intel and Apple Silicon Macs
- Metal GPU acceleration (automatic)
- Xcode 17+ compatibility
- All LoRA adapter features
## Integration Examples
### VSCode Copilot
```json
{
"github.copilot.advanced": {
"serverUrl": "http://localhost:11435"
}
}
```
### Continue.dev
```json
{
"models": [{
"title": "Local Shimmy",
"provider": "openai",
"model": "llama-3.2-1b",
"apiBase": "http://localhost:11435/v1"
}]
}
```
### Cursor
```json
{
"models": [{
"model": "shimmy-local",
"apiBase": "http://localhost:11435/v1",
"provider": "openai"
}]
}
```
### Direct API Usage
```bash
# List available models
curl http://localhost:11435/v1/models
# Chat completion
curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-1b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
## 🚀 Features
- **🔍 Auto-Discovery**: Finds GGUF models automatically in standard locations
- **🎯 Smart Port Management**: Assigns unique ports per model (11435, 11436, ...)
- **⚡ Fast Loading**: Native SafeTensors support, no Python overhead
- **🔧 Zero Config**: Works out of the box with sensible defaults
- **🎨 LoRA Support**: Load LoRA adapters with `--lora` flag
- **📊 Monitoring**: Built-in metrics and health endpoints
- **🐳 Docker Ready**: Full containerization support
- **🔌 Plugin System**: Extensible architecture for custom features
## Command Reference
```bash
# Serve all models (auto-discovery)
shimmy serve
# Serve specific model
shimmy serve --model /path/to/model.gguf
# With LoRA adapter
shimmy serve --model base.gguf --lora adapter.gguf
# Custom port and host
shimmy serve --bind 0.0.0.0:8080
# Discover models without serving
shimmy discover
# Show version and build info
shimmy --version
```
## Development
```bash
# Build from source
git clone https://github.com/Michael-A-Kuykendall/shimmy.git
cd shimmy
# Full build with all features
cargo build --release --features full
# Minimal build (SafeTensors only)
cargo build --release --features huggingface
```
## License
MIT License - see [LICENSE](https://github.com/Michael-A-Kuykendall/shimmy/blob/main/LICENSE) for details.
## Support
- **⭐ Star us on GitHub**: [github.com/Michael-A-Kuykendall/shimmy](https://github.com/Michael-A-Kuykendall/shimmy)
- **💬 Discussions**: [GitHub Discussions](https://github.com/Michael-A-Kuykendall/shimmy/discussions)
- **🐛 Issues**: [GitHub Issues](https://github.com/Michael-A-Kuykendall/shimmy/issues)
- **❤️ Sponsor**: [GitHub Sponsors](https://github.com/sponsors/Michael-A-Kuykendall)
---
**Made with ❤️ for the AI development community**