Crate infernum_arbiter

Expand description

§Arbiter - Unified GPU Coordination

“The judge allocates resources justly”

Arbiter coordinates GPU resources between Infernum (LLM inference) and Dantalion (diffusion/image generation), enabling simultaneous multimodal workloads on a single GPU.

§Core Principles

Quality-Aware Scheduling: Both systems can run at reduced quality when sharing GPU, with quality improving as resources become available.
Priority-Based Arbitration: User-facing workloads get priority, background improvement yields when needed.
Unified Fragment Cache: HoloTensor fragments are cached across both systems, avoiding redundant loading.

§Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         ARBITER                                 │
│  Monitors GPU memory, coordinates quality targets, routes work  │
└──────────────────────┬──────────────────────────────────────────┘
                       │
         ┌─────────────┴─────────────┐
         │                           │
         ▼                           ▼
┌─────────────────────┐     ┌─────────────────────┐
│     INFERNUM        │     │     DANTALION       │
│  (LLM Inference)    │     │  (Diffusion)        │
│                     │     │                     │
│  Quality: 40-100%   │     │  Quality: 30-100%   │
│  via HoloTensor     │     │  via ProgressiveLoad│
└─────────────────────┘     └─────────────────────┘
         │                           │
         └─────────────┬─────────────┘
                       │
                       ▼
         ┌─────────────────────────────┐
         │    UNIFIED FRAGMENT CACHE   │
         │  VRAM ← RAM ← NVMe ← CDN    │
         └─────────────────────────────┘

§Example

use arbiter::{Arbiter, ArbiterConfig, WorkloadType, Priority};

let arbiter = Arbiter::new(ArbiterConfig::auto_detect())?;

// Request LLM inference at high priority
let llm_allocation = arbiter.request_allocation(
    WorkloadType::LlmInference,
    Priority::UserFacing,
).await?;

// LLM gets 70% quality, Dantalion drops to 40%
let diffusion_allocation = arbiter.request_allocation(
    WorkloadType::ImageGeneration,
    Priority::Background,
).await?;

Re-exports§

pub use allocation::Allocation;
pub use allocation::AllocationRequest;
pub use allocation::AllocationResult;
pub use cache::CacheConfig;
pub use cache::CacheStats;
pub use cache::CacheTier;
pub use cache::FragmentCache;
pub use coordinator::Coordinator;
pub use coordinator::CoordinatorConfig;
pub use gpu::DetectionMethod;
pub use gpu::GpuDetectionResult;
pub use gpu::GpuDetector;
pub use gpu::GpuInfo;
pub use gpu::GpuVendor;
pub use memory::GpuMemoryTracker;
pub use memory::MemoryPressure;
pub use memory::MemoryStats;
pub use priority::Priority;
pub use priority::WorkloadType;
pub use quality::QualityAllocation;
pub use quality::QualityBudget;
pub use quality::QualityPolicy;

Modules§

allocation: Allocation types and requests.
cache: Fragment cache for HoloTensor weights.
coordinator: Coordinator for quality targets between workloads.
gpu: GPU detection and information gathering.
memory: GPU memory tracking and pressure monitoring.
priority: Priority and workload type definitions.
quality: Quality budget and allocation for workloads.

Structs§

Arbiter: The main GPU arbiter coordinating Infernum and Dantalion.
ArbiterConfig: Configuration for the Arbiter.
ArbiterState: Current state of the Arbiter.
ArbiterStats: Statistics for the Arbiter.

Enums§

ArbiterError: Errors from Arbiter operations.

Type Aliases§

Result: Result type for Arbiter operations.

Crate infernum_arbiter

Crate infernum_arbiter Copy item path

§Arbiter - Unified GPU Coordination

§Core Principles

§Architecture

§Example

Re-exports§

Modules§

Structs§

Enums§

Type Aliases§

Crate infernum_arbiter