rusty-genius

The Nervous System for AI. A high-performance, modular, local-first AI orchestration library written in Rust.

Overview

Rusty-Genius is built for on-device orchestration, prioritizing absolute privacy, zero latency, and offline reliability. It decouples protocol, orchestration, engine, and tooling to provide a flexible foundation for modern AI applications.

Rusty Genius Architecture

Architecture

The project follows a biological metaphor, where each component serves a specific function in the "nervous system":

Public Crates

Genius (rusty-genius): The Public Facade. Re-exports internal crates and provides the primary user API.

Internal Crates

Brainstem (rusty-genius-stem): The Orchestrator. Manages the central event loop, engine lifecycle (TTL), and state transitions.
Cortex (rusty-genius-cortex): The Muscle. Provides direct bindings to llama.cpp for inference, handling KV caching and token streaming.
Core (rusty-genius-core): The Shared Vocabulary. Contains protocol enums, manifests, and error definitions with zero internal dependencies.
Teaser (rusty-genius-teaser): The QA Harness. Provides integration testing via file-system fixtures.

Global Architecture

graph TD
    User([User App]) --> Genius["rusty-genius Facade"]
    Genius --> Brainstem["orchestrator :: rust-genius-stem"]
    Brainstem --> Cortex["engine :: rust-genius-cortex"]
    Brainstem --> Facecrab["assets :: facecrab"]
    Cortex --> Llama["llama.cpp / Pinky Stub"]
    Facecrab --> HF["HuggingFace / Local Registry"]

Integration Crates

Facecrab (facecrab): The Supplier. An autonomous asset authority that handles model resolution (HuggingFace), registry management, and downloads.

This crate is also usable as an independent crate. It provides a simple interface for downloading and managing models directly via async fn calls or a higher-level asynchronous event interface:

use facecrab::AssetAuthority;

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let authority = AssetAuthority::new()?;
    
    // Resolves registry names or direct HF IDs, downloads to local cache automatically
    // Returns the absolute file path on success
    let model_path = authority.ensure_model("tiny-model").await?;
    println!("Model ready at: {:?}", model_path);
    Ok(())
}

Features

Local-First: No data leaves your machine. No API keys or subscriptions required for core inference.
Modular Design: Swap or stub components (like the "Pinky" engine stub) for testing and development.
High Performance: Native hardware acceleration via Metal (macOS), CUDA (Linux/Windows), and Vulkan.
Async Architecture: Built on async-std and surf for efficient, non-blocking I/O.

Installation

Add rusty-genius to your Cargo.toml:

[dependencies]
rusty-genius = { version = "0.1.1", features = ["metal"] }

Hardware Acceleration

Enable the appropriate feature for your hardware:

Metal: features = ["metal"] (macOS Apple Silicon/Intel)
CUDA: features = ["cuda"] (NVIDIA GPUs)
Vulkan: features = ["vulkan"] (Generic/Intel GPUs)

Try It Out

You can run the included examples to test the system immediately. Ensure you have the prerequisites installed.

1. Test Asset Downloader

Verify that facecrab can resolve and download models from HuggingFace:

cargo run -p facecrab --example downloader

2. Test Local Inference

Run a full chat loop using the real-engine (requires llama.cpp to build).

CPU (Generic):

cargo run -p rusty-genius --example basic_chat --features real-engine

GPU (macOS / Metal):

cargo run -p rusty-genius --example basic_chat --features metal

GPU (NVIDIA / CUDA):

cargo run -p rusty-genius --example basic_chat --features cuda

Configuration

Rusty-Genius can be configured via environment variables and local manifest files.

Environment Variables

Variable	Description	Default
`GENIUS_HOME`	Primary directory for configuration and the model registry.	`~/.config/rusty-genius`
`GENIUS_CACHE`	Directory where downloaded model assets (.gguf) are stored.	`$GENIUS_HOME/cache`
`RUSTY_GENIUS_CONFIG_DIR`	Alternative override for the configuration directory.	-

User Injected Manifest API

The "Injected Manifest" API allows you to extend the system's model awareness without modifying the library. You can "inject" custom models by creating or updating a registry.toml file in your GENIUS_HOME.

Location: ~/.config/rusty-genius/registry.toml

[[models]]
name = "my-custom-model"
repo = "TheBloke/Llama-2-7B-Chat-GGUF"
filename = "llama-2-7b-chat.Q4_K_M.gguf"
quantization = "Q4_K_M"

Once defined in your local registry, the model can be loaded by its friendly name:

// The orchestrator will now treat "my-custom-model" as a first-class citizen
input.send(BrainstemInput::LoadModel("my-custom-model".into())).await?;

Usage Methods

1. Unified Orchestration (Recommended)

The most robust way to use Rusty-Genius is via the Orchestrator. It manages the background event loop, model lifecycle (loading/unloading), and hardware stubs.

Lifecycle & Inference Flow

sequenceDiagram
    participant U as User App
    participant B as Brainstem
    participant F as Facecrab
    participant C as Cortex

    U->>B: BrainstemInput::LoadModel("qwen-2.5")
    B->>F: asset_authority.ensure_model("qwen-2.5")
    F-->>B: Local Path (Cached/Downloaded)
    B->>C: engine.load_model(path)
    C-->>B: Ok
    
    U->>B: BrainstemInput::Infer(prompt)
    B->>C: engine.infer(prompt)
    loop Token Streaming
        C-->>B: Token/Thought Event
        B-->>U: BrainstemOutput::Event(Content/Thought)
    end
    B-->>U: BrainstemOutput::Event(Complete)

Engine Lifecycle & TTL

The Orchestrator implements a CortexStrategy to manage the inference engine's memory footprint. By default, it will hibernate (unload) the model after 5 minutes of inactivity.

stateDiagram-v2
    [*] --> Unloaded: Start
    Unloaded --> Loading: LoadModel
    Loading --> Loaded: Success
    Loading --> Unloaded: Error
    Loaded --> Inferring: Infer
    Inferring --> Loaded: Complete
    Loaded --> Unloaded: 5m Inactivity (Unload)
    Unloaded --> Loaded: LoadModel (Reload)
    Unloaded --> [*]: Stop

Full Implementation Example

use rusty_genius::Orchestrator;
use rusty_genius::core::protocol::{AssetEvent, BrainstemInput, BrainstemOutput, InferenceEvent};
use futures::{StreamExt, sink::SinkExt, channel::mpsc};

#[async_std::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Initialize the orchestrator (with default 5m TTL)
    let mut genius = Orchestrator::new().await?;
    let (mut input, rx) = mpsc::channel(100);
    let (tx, mut output) = mpsc::channel(100);

    // Spawn the Brainstem event loop
    async_std::task::spawn(async move { 
        if let Err(e) = genius.run(rx, tx).await {
            eprintln!("Orchestrator error: {}", e);
        }
    });

    // 2. Load a model (downloads from HuggingFace if not cached)
    input.send(BrainstemInput::LoadModel(
        "tiny-model".into()
    )).await?;

    // 3. Submit a prompt
    input.send(BrainstemInput::Infer {
        prompt: "Once upon a time, in the world of systems programming...".into(),
        config: Default::default(),
    }).await?;

    // 4. Stream results
    println!("--- Response ---");
    while let Some(msg) = output.next().await {
        match msg {
            BrainstemOutput::Asset(a) => match a {
                AssetEvent::Started(s) => println!("[Asset] Starting: {}", s),
                AssetEvent::Complete(path) => println!("[Asset] Ready at: {}", path),
                AssetEvent::Error(e) => eprintln!("[Asset] Error: {}", e),
                _ => {} // Handle progress if desired
            },
            BrainstemOutput::Event(e) => match e {
                InferenceEvent::Content(c) => print!("{}", c),
                InferenceEvent::Complete => {
                    println!("\n--- Complete ---");
                    break;
                }
                _ => {}
            },
            BrainstemOutput::Error(err) => {
                eprintln!("\nBrainstem Error: {}", err);
                break;
            }
        }
    }

    Ok(())
}

2. Standalone Asset Management

If you only need a high-performance downloader for GGUF/LLM assets with a local registry, you can use facecrab directly as shown in the Integration Crates section.

OS Prerequisites

macOS

Install Command Line Tools: xcode-select --install
Install CMake: brew install cmake

Linux

Build Essentials: apt install build-essential cmake libclang-dev
CPU/GPU specific headers (CUDA Toolkit, etc.)

Windows

Visual Studio 2022 (C++ Workload)
CMake
LIBCLANG_PATH set in system environment

Technical Note

[!IMPORTANT] Cargo.lock is tracked in this repository to ensure development stability and reproducible builds across the workspace. If you are a library consumer, please note that Cargo.lock is ignored when publishing to crates.io.

License

Released under the MIT License. Usage with dependencies may be subject to the licenses of those dependencies. Contents of ./site/assets/images are generated with Nano Banana Pro. Copyright (c) 2026 Timothy Meade.

rusty-genius-core 0.1.2