shimmy 1.3.4

Lightweight 5MB Ollama alternative with native SafeTensors support. No Python dependencies, 2x faster loading.
Documentation

Shimmy - The 5MB Alternative to Ollama

Crates.io Downloads License: MIT Rust CI Sponsor

Shimmy will be free forever. No asterisks. No "free for now." No pivot to paid.

Fast, reliable local AI inference. Shimmy provides OpenAI-compatible endpoints for GGUF models with comprehensive testing and automated quality assurance.

What is Shimmy?

Shimmy is a 5.1MB single-binary local inference server that provides OpenAI API-compatible endpoints for GGUF models. It's designed to be the invisible infrastructure that just works.

Metric Shimmy Ollama
Binary Size 5.1MB 🏆 680MB
Startup Time <100ms 🏆 5-10s
Memory Overhead <50MB 🏆 200MB+
OpenAI Compatibility 100% 🏆 Partial
Port Management Auto 🏆 Manual
Configuration Zero 🏆 Manual

Why Choose Shimmy?

  • Zero Configuration: Auto-discovers models and assigns ports
  • Native SafeTensors: No Python dependencies, 2x faster loading
  • OpenAI Compatible: Drop-in replacement for OpenAI API calls
  • Cross-Platform: Windows, macOS, Linux (including ARM64)
  • Integration: Works with VSCode, Cursor, Continue.dev out of the box

BONUS: First-class LoRA adapter support - from training to production API in 30 seconds.

Quick Start (30 seconds)

# Install via cargo

cargo install shimmy


# Auto-discover models and start server  

shimmy serve


# 🚀 Server running at http://localhost:11435

# ✅ Found 3 models: llama-3.2-1b, phi-3-mini, mistral-7b

# 📡 OpenAI API compatible endpoints ready

Point your AI tools to the displayed port - VSCode Copilot, Cursor, Continue.dev all work instantly!

📦 Installation

Package Managers

Direct Downloads

  • GitHub Releases: Latest binaries
  • Docker: docker pull ghcr.io/michael-a-kuykendall/shimmy:latest

🐳 Docker Setup

Quick Start:

# Clone repo and run (builds locally)

git clone https://github.com/Michael-A-Kuykendall/shimmy.git

cd shimmy


# Start with docker-compose (builds locally)

docker-compose up


# Or pull from GitHub Container Registry

docker run -p 11434:11434 -v ./models:/app/models ghcr.io/michael-a-kuykendall/shimmy:latest

🍎 macOS Support

Full compatibility confirmed! Shimmy works flawlessly on macOS with Metal GPU acceleration.

# Install dependencies

brew install cmake rust


# Install shimmy

cargo install shimmy

✅ Verified working:

  • Intel and Apple Silicon Macs
  • Metal GPU acceleration (automatic)
  • Xcode 17+ compatibility
  • All LoRA adapter features

Integration Examples

VSCode Copilot

{
  "github.copilot.advanced": {
    "serverUrl": "http://localhost:11435"
  }
}

Continue.dev

{
  "models": [{
    "title": "Local Shimmy",
    "provider": "openai",
    "model": "llama-3.2-1b",
    "apiBase": "http://localhost:11435/v1"
  }]
}

Cursor

{
  "models": [{
    "model": "shimmy-local",
    "apiBase": "http://localhost:11435/v1",
    "provider": "openai"
  }]
}

Direct API Usage

# List available models

curl http://localhost:11435/v1/models


# Chat completion

curl http://localhost:11435/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{
    "model": "llama-3.2-1b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

🚀 Features

  • 🔍 Auto-Discovery: Finds GGUF models automatically in standard locations
  • 🎯 Smart Port Management: Assigns unique ports per model (11435, 11436, ...)
  • ⚡ Fast Loading: Native SafeTensors support, no Python overhead
  • 🔧 Zero Config: Works out of the box with sensible defaults
  • 🎨 LoRA Support: Load LoRA adapters with --lora flag
  • 📊 Monitoring: Built-in metrics and health endpoints
  • 🐳 Docker Ready: Full containerization support
  • 🔌 Plugin System: Extensible architecture for custom features

Command Reference

# Serve all models (auto-discovery)

shimmy serve


# Serve specific model

shimmy serve --model /path/to/model.gguf


# With LoRA adapter

shimmy serve --model base.gguf --lora adapter.gguf


# Custom port and host

shimmy serve --bind 0.0.0.0:8080


# Discover models without serving

shimmy discover


# Show version and build info

shimmy --version

Development

# Build from source

git clone https://github.com/Michael-A-Kuykendall/shimmy.git

cd shimmy


# Full build with all features

cargo build --release --features full


# Minimal build (SafeTensors only)

cargo build --release --features huggingface

License

MIT License - see LICENSE for details.

Support


Made with ❤️ for the AI development community