infernum-arbiter 0.2.0-rc.2

Unified GPU arbiter - coordinates Infernum (LLM) and Dantalion (Diffusion) workloads
Documentation

Arbiter - Unified GPU Coordination

"The judge allocates resources justly"

Arbiter coordinates GPU resources between Infernum (LLM inference) and Dantalion (diffusion/image generation), enabling simultaneous multimodal workloads on a single GPU.

Core Principles

  1. Quality-Aware Scheduling: Both systems can run at reduced quality when sharing GPU, with quality improving as resources become available.

  2. Priority-Based Arbitration: User-facing workloads get priority, background improvement yields when needed.

  3. Unified Fragment Cache: HoloTensor fragments are cached across both systems, avoiding redundant loading.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         ARBITER                                 │
│  Monitors GPU memory, coordinates quality targets, routes work  │
└──────────────────────┬──────────────────────────────────────────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
         ▼                           ▼
┌─────────────────────┐     ┌─────────────────────┐
│     INFERNUM        │     │     DANTALION       │
│  (LLM Inference)    │     │  (Diffusion)        │
│                     │     │                     │
│  Quality: 40-100%   │     │  Quality: 30-100%   │
│  via HoloTensor     │     │  via ProgressiveLoad│
└─────────────────────┘     └─────────────────────┘
         │                           │
         └─────────────┬─────────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │    UNIFIED FRAGMENT CACHE   │
         │  VRAM ← RAM ← NVMe ← CDN    │
         └─────────────────────────────┘

Example

use arbiter::{Arbiter, ArbiterConfig, WorkloadType, Priority};

let arbiter = Arbiter::new(ArbiterConfig::auto_detect())?;

// Request LLM inference at high priority
let llm_allocation = arbiter.request_allocation(
    WorkloadType::LlmInference,
    Priority::UserFacing,
).await?;

// LLM gets 70% quality, Dantalion drops to 40%
let diffusion_allocation = arbiter.request_allocation(
    WorkloadType::ImageGeneration,
    Priority::Background,
).await?;