oxide-rs
A high-performance, memory-safe, and lightweight LLM inference engine written in pure Rust. Optimized for CPU-based inference and inspired by the efficiency of llama.cpp.
Why oxide-rs
- Run GGUF models locally with a simple CLI.
- Stream tokens in real time.
- Use embedded GGUF chat templates automatically.
- Download and track models from Hugging Face.
- Embed the same inference stack in Rust applications.
Install
Or build from source:
Quick start
Download a GGUF model:
List downloaded models:
Run interactive chat:
Run one-shot generation:
Adjust sampling:
OpenAI-Compatible Server
Run oxide-rs as an OpenAI API-compatible HTTP server:
The server provides OpenAI-compatible endpoints:
# Chat completions (non-streaming)
# Chat completions (streaming)
# List available models
Features:
- Specify model path in request body (lazy loading)
- Models are cached after first use
- Streaming support with Server-Sent Events
- OpenAI-compatible response format
- CORS enabled for browser clients
TUI (Terminal UI)
The TUI provides an interactive sidebar-driven interface with:
- Chat screen with live streaming and thinking spinner
- Models screen with selection and active/highlighted markers
- Settings screen for generation parameters and system prompt editing
Run the TUI:
Shortcuts:
F1or?— toggle help/shortcuts overlayTab/Shift+Tab— cycle focus between sidebar, main panel, and input- Sidebar:
j/kto select,Enterto open a screen - Chat:
Enterto send prompt;j/kscrolls history when main panel focused - Models:
j/kto move,Enterto load,xto remove,dshows download hint - Settings:
j/kto choose field,h/lto adjust,Enterto apply,rto reset, type to edit system prompt
Common commands
# show files before downloading
# remove a registered model
Requirements
- Rust 1.70+
- A GGUF model file
- For chat mode, use a model with an embedded chat template
Oxide is currently focused on CPU-based local inference.
Supported formats
- GGUF
- LLaMA-compatible architectures
- LFM2
Library
use ;
Docs
- docs/getting-started.md - installation, CLI workflows, model management
- docs/api-reference.md - CLI flags, interactive commands, library API
- docs/library-usage.md - embedding Oxide in Rust code
- docs/examples.md - usage patterns and code snippets
- docs/architecture.md - internals, supported models, performance notes
License
MIT. See LICENSE.