Expand description
§Arbiter - Unified GPU Coordination
“The judge allocates resources justly”
Arbiter coordinates GPU resources between Infernum (LLM inference) and Dantalion (diffusion/image generation), enabling simultaneous multimodal workloads on a single GPU.
§Core Principles
-
Quality-Aware Scheduling: Both systems can run at reduced quality when sharing GPU, with quality improving as resources become available.
-
Priority-Based Arbitration: User-facing workloads get priority, background improvement yields when needed.
-
Unified Fragment Cache: HoloTensor fragments are cached across both systems, avoiding redundant loading.
§Architecture
┌─────────────────────────────────────────────────────────────────┐
│ ARBITER │
│ Monitors GPU memory, coordinates quality targets, routes work │
└──────────────────────┬──────────────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ INFERNUM │ │ DANTALION │
│ (LLM Inference) │ │ (Diffusion) │
│ │ │ │
│ Quality: 40-100% │ │ Quality: 30-100% │
│ via HoloTensor │ │ via ProgressiveLoad│
└─────────────────────┘ └─────────────────────┘
│ │
└─────────────┬─────────────┘
│
▼
┌─────────────────────────────┐
│ UNIFIED FRAGMENT CACHE │
│ VRAM ← RAM ← NVMe ← CDN │
└─────────────────────────────┘§Example
ⓘ
use arbiter::{Arbiter, ArbiterConfig, WorkloadType, Priority};
let arbiter = Arbiter::new(ArbiterConfig::auto_detect())?;
// Request LLM inference at high priority
let llm_allocation = arbiter.request_allocation(
WorkloadType::LlmInference,
Priority::UserFacing,
).await?;
// LLM gets 70% quality, Dantalion drops to 40%
let diffusion_allocation = arbiter.request_allocation(
WorkloadType::ImageGeneration,
Priority::Background,
).await?;Re-exports§
pub use allocation::Allocation;pub use allocation::AllocationRequest;pub use allocation::AllocationResult;pub use cache::CacheConfig;pub use cache::CacheStats;pub use cache::CacheTier;pub use cache::FragmentCache;pub use coordinator::Coordinator;pub use coordinator::CoordinatorConfig;pub use gpu::DetectionMethod;pub use gpu::GpuDetectionResult;pub use gpu::GpuDetector;pub use gpu::GpuInfo;pub use gpu::GpuVendor;pub use memory::GpuMemoryTracker;pub use memory::MemoryPressure;pub use memory::MemoryStats;pub use priority::Priority;pub use priority::WorkloadType;pub use quality::QualityAllocation;pub use quality::QualityBudget;pub use quality::QualityPolicy;
Modules§
- allocation
- Allocation types and requests.
- cache
- Fragment cache for HoloTensor weights.
- coordinator
- Coordinator for quality targets between workloads.
- gpu
- GPU detection and information gathering.
- memory
- GPU memory tracking and pressure monitoring.
- priority
- Priority and workload type definitions.
- quality
- Quality budget and allocation for workloads.
Structs§
- Arbiter
- The main GPU arbiter coordinating Infernum and Dantalion.
- Arbiter
Config - Configuration for the Arbiter.
- Arbiter
State - Current state of the Arbiter.
- Arbiter
Stats - Statistics for the Arbiter.
Enums§
- Arbiter
Error - Errors from Arbiter operations.
Type Aliases§
- Result
- Result type for Arbiter operations.