rusty-genius
The Nervous System for AI. A high-performance, modular, local-first AI orchestration library written in Rust.
Overview
Rusty-Genius is built for on-device orchestration, prioritizing absolute privacy, zero latency, and offline reliability. It decouples protocol, orchestration, engine, and tooling to provide a flexible foundation for modern AI applications.

Architecture
The project follows a biological metaphor, where each component serves a specific function in the "nervous system":
Public Crates
- Genius (
rusty-genius): The Public Facade. Re-exports internal crates and provides the primary user API.
Internal Crates
- Brainstem (
rusty-genius-stem): The Orchestrator. Manages the central event loop, engine lifecycle (TTL), and state transitions. - Cortex (
rusty-genius-cortex): The Muscle. Provides direct bindings tollama.cppfor inference, handling KV caching and token streaming. - Core (
rusty-genius-core): The Shared Vocabulary. Contains protocol enums, manifests, and error definitions with zero internal dependencies. - Teaser (
rusty-genius-teaser): The QA Harness. Provides integration testing via file-system fixtures.
Global Architecture
graph TD
User([User App]) --> Genius["rusty-genius Facade"]
Genius --> Brainstem["orchestrator :: rust-genius-stem"]
Brainstem --> Cortex["engine :: rust-genius-cortex"]
Brainstem --> Facecrab["assets :: facecrab"]
Cortex --> Llama["llama.cpp / Pinky Stub"]
Facecrab --> HF["HuggingFace / Local Registry"]
Integration Crates
- Facecrab (
facecrab): The Supplier. An autonomous asset authority that handles model resolution (HuggingFace), registry management, and downloads.
This crate is also usable as an independent crate. It provides a simple interface for downloading and managing models directly via async fn calls or a higher-level asynchronous event interface:
use AssetAuthority;
async
Features
- Local-First: No data leaves your machine. No API keys or subscriptions required for core inference.
- Modular Design: Swap or stub components (like the "Pinky" engine stub) for testing and development.
- High Performance: Native hardware acceleration via Metal (macOS), CUDA (Linux/Windows), and Vulkan.
- Async Architecture: Built on
async-stdandsurffor efficient, non-blocking I/O.
Installation
Add rusty-genius to your Cargo.toml:
[]
= { = "0.1.0", = ["metal"] }
Hardware Acceleration
Enable the appropriate feature for your hardware:
- Metal:
features = ["metal"](macOS Apple Silicon/Intel) - CUDA:
features = ["cuda"](NVIDIA GPUs) - Vulkan:
features = ["vulkan"](Generic/Intel GPUs)
Usage Methods
1. Unified Orchestration (Recommended)
The most robust way to use Rusty-Genius is via the Orchestrator. It manages the background event loop, model lifecycle (loading/unloading), and hardware stubs.
Lifecycle & Inference Flow
sequenceDiagram
participant U as User App
participant B as Brainstem
participant F as Facecrab
participant C as Cortex
U->>B: BrainstemInput::LoadModel("qwen-2.5")
B->>F: asset_authority.ensure_model("qwen-2.5")
F-->>B: Local Path (Cached/Downloaded)
B->>C: engine.load_model(path)
C-->>B: Ok
U->>B: BrainstemInput::Infer(prompt)
B->>C: engine.infer(prompt)
loop Token Streaming
C-->>B: Token/Thought Event
B-->>U: BrainstemOutput::Event(Content/Thought)
end
B-->>U: BrainstemOutput::Event(Complete)
Engine Lifecycle & TTL
The Orchestrator implements a CortexStrategy to manage the inference engine's memory footprint. By default, it will hibernate (unload) the model after 5 minutes of inactivity.
stateDiagram-v2
[*] --> Unloaded: Start
Unloaded --> Loading: LoadModel
Loading --> Loaded: Success
Loading --> Unloaded: Error
Loaded --> Inferring: Infer
Inferring --> Loaded: Complete
Loaded --> Unloaded: 5m Inactivity (Unload)
Unloaded --> Loaded: LoadModel (Reload)
Unloaded --> [*]: Stop
Full Implementation Example
use Orchestrator;
use ;
use ;
async
2. Standalone Asset Management
If you only need a high-performance downloader for GGUF/LLM assets with a local registry, you can use facecrab directly as shown in the Integration Crates section.
OS Prerequisites
macOS
- Install Command Line Tools:
xcode-select --install - Install CMake:
brew install cmake
Linux
- Build Essentials:
apt install build-essential cmake libclang-dev - CPU/GPU specific headers (CUDA Toolkit, etc.)
Windows
- Visual Studio 2022 (C++ Workload)
- CMake
LIBCLANG_PATHset in system environment
Technical Note
[!IMPORTANT] Cargo.lock is tracked in this repository to ensure development stability and reproducible builds across the workspace. If you are a library consumer, please note that
Cargo.lockis ignored when publishing to crates.io.
License
Released under the MIT License. Usage with dependencies may be subject to the licenses of those dependencies. Contents of ./site/assets/images are generated with Nano Banana Pro. Copyright (c) 2026 Timothy Meade.