shimmy 1.3.4

Lightweight 5MB Ollama alternative with native SafeTensors support. No Python dependencies, 2x faster loading.
Documentation
# Shimmy - The 5MB Alternative to Ollama


[![Crates.io](https://img.shields.io/crates/v/shimmy.svg)](https://crates.io/crates/shimmy)
[![Downloads](https://img.shields.io/crates/d/shimmy.svg)](https://crates.io/crates/shimmy)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Rust](https://img.shields.io/badge/rust-stable-brightgreen.svg)](https://rustup.rs/)
[![CI](https://github.com/Michael-A-Kuykendall/shimmy/workflows/CI/badge.svg)](https://github.com/Michael-A-Kuykendall/shimmy/actions)
[![Sponsor](https://img.shields.io/badge/❤️-Sponsor-ea4aaa?logo=github)](https://github.com/sponsors/Michael-A-Kuykendall)

**Shimmy will be free forever.** No asterisks. No "free for now." No pivot to paid.

**Fast, reliable local AI inference.** Shimmy provides OpenAI-compatible endpoints for GGUF models with comprehensive testing and automated quality assurance.

## What is Shimmy?


Shimmy is a **5.1MB single-binary** local inference server that provides OpenAI API-compatible endpoints for GGUF models. It's designed to be the **invisible infrastructure** that just works.

| Metric | Shimmy | Ollama | 
|--------|--------|--------|
| **Binary Size** | 5.1MB 🏆 | 680MB |
| **Startup Time** | <100ms 🏆 | 5-10s |
| **Memory Overhead** | <50MB 🏆 | 200MB+ |
| **OpenAI Compatibility** | 100% 🏆 | Partial |
| **Port Management** | Auto 🏆 | Manual |
| **Configuration** | Zero 🏆 | Manual |

## Why Choose Shimmy?


- **Zero Configuration**: Auto-discovers models and assigns ports
- **Native SafeTensors**: No Python dependencies, 2x faster loading
- **OpenAI Compatible**: Drop-in replacement for OpenAI API calls
- **Cross-Platform**: Windows, macOS, Linux (including ARM64)
- **Integration**: Works with VSCode, Cursor, Continue.dev out of the box  

**BONUS:** First-class LoRA adapter support - from training to production API in 30 seconds.

## Quick Start (30 seconds)


```bash
# Install via cargo

cargo install shimmy

# Auto-discover models and start server  

shimmy serve

# 🚀 Server running at http://localhost:11435

# ✅ Found 3 models: llama-3.2-1b, phi-3-mini, mistral-7b

# 📡 OpenAI API compatible endpoints ready

```

Point your AI tools to the displayed port - VSCode Copilot, Cursor, Continue.dev all work instantly!

## 📦 Installation


### Package Managers

- **Rust**: `cargo install shimmy` 
- **VS Code**: [Shimmy Extension]https://marketplace.visualstudio.com/items?itemName=targetedwebresults.shimmy-vscode

### Direct Downloads

- **GitHub Releases**: [Latest binaries]https://github.com/Michael-A-Kuykendall/shimmy/releases/latest
- **Docker**: `docker pull ghcr.io/michael-a-kuykendall/shimmy:latest`

### 🐳 Docker Setup


**Quick Start:**
```bash
# Clone repo and run (builds locally)

git clone https://github.com/Michael-A-Kuykendall/shimmy.git
cd shimmy

# Start with docker-compose (builds locally)

docker-compose up

# Or pull from GitHub Container Registry

docker run -p 11434:11434 -v ./models:/app/models ghcr.io/michael-a-kuykendall/shimmy:latest
```

### 🍎 macOS Support


**Full compatibility confirmed!** Shimmy works flawlessly on macOS with Metal GPU acceleration.

```bash
# Install dependencies

brew install cmake rust

# Install shimmy

cargo install shimmy
```

**✅ Verified working:**
- Intel and Apple Silicon Macs
- Metal GPU acceleration (automatic)
- Xcode 17+ compatibility
- All LoRA adapter features

## Integration Examples


### VSCode Copilot

```json
{
  "github.copilot.advanced": {
    "serverUrl": "http://localhost:11435"
  }
}
```

### Continue.dev

```json
{
  "models": [{
    "title": "Local Shimmy",
    "provider": "openai",
    "model": "llama-3.2-1b",
    "apiBase": "http://localhost:11435/v1"
  }]
}
```

### Cursor

```json
{
  "models": [{
    "model": "shimmy-local",
    "apiBase": "http://localhost:11435/v1",
    "provider": "openai"
  }]
}
```

### Direct API Usage

```bash
# List available models

curl http://localhost:11435/v1/models

# Chat completion

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

## 🚀 Features


- **🔍 Auto-Discovery**: Finds GGUF models automatically in standard locations
- **🎯 Smart Port Management**: Assigns unique ports per model (11435, 11436, ...)
- **⚡ Fast Loading**: Native SafeTensors support, no Python overhead
- **🔧 Zero Config**: Works out of the box with sensible defaults
- **🎨 LoRA Support**: Load LoRA adapters with `--lora` flag
- **📊 Monitoring**: Built-in metrics and health endpoints
- **🐳 Docker Ready**: Full containerization support
- **🔌 Plugin System**: Extensible architecture for custom features

## Command Reference


```bash
# Serve all models (auto-discovery)

shimmy serve

# Serve specific model

shimmy serve --model /path/to/model.gguf

# With LoRA adapter

shimmy serve --model base.gguf --lora adapter.gguf

# Custom port and host

shimmy serve --bind 0.0.0.0:8080

# Discover models without serving

shimmy discover

# Show version and build info

shimmy --version
```

## Development


```bash
# Build from source

git clone https://github.com/Michael-A-Kuykendall/shimmy.git
cd shimmy

# Full build with all features

cargo build --release --features full

# Minimal build (SafeTensors only)

cargo build --release --features huggingface
```

## License


MIT License - see [LICENSE](https://github.com/Michael-A-Kuykendall/shimmy/blob/main/LICENSE) for details.

## Support


- **⭐ Star us on GitHub**: [github.com/Michael-A-Kuykendall/shimmy]https://github.com/Michael-A-Kuykendall/shimmy
- **💬 Discussions**: [GitHub Discussions]https://github.com/Michael-A-Kuykendall/shimmy/discussions 
- **🐛 Issues**: [GitHub Issues]https://github.com/Michael-A-Kuykendall/shimmy/issues
- **❤️ Sponsor**: [GitHub Sponsors]https://github.com/sponsors/Michael-A-Kuykendall

---

**Made with ❤️ for the AI development community**