pub struct TimeSlicePolicy { /* private fields */ }Expand description
Drain-first scheduling policy with a proactive background scheduler.
This policy minimizes GPU time wasted on model switches by following two principles:
-
Never preempt a serving model. When a request arrives for a non-active model, the policy defers to the background scheduler rather than switching reactively. The only exception is the staleness bound, which forces a switch if any request has waited longer than
max_wait. -
Switch when idle. The background scheduler periodically checks all models’ queue depths. When the active model has completely drained its queue (no pending requests, no in-flight), the scheduler switches to the model with the most waiting requests.
This is equivalent to “serve everything from the active model’s queue, then switch to whoever has the most demand.” The scheduler’s global visibility into all queue depths prevents the pathological back-and-forth switching that reactive policies cause under interleaved or dominant workloads.
In simulation across 12 workload profiles at switch costs from 2s to 20s, this policy achieves 61-94% GPU serving time vs CostAware’s 40-81% and FIFO’s 33-79%, while also delivering 2-6x lower maximum wait times.
Implementations§
Trait Implementations§
Source§impl SwitchPolicy for TimeSlicePolicy
impl SwitchPolicy for TimeSlicePolicy
Source§fn on_pending_request<'life0, 'life1, 'async_trait>(
&'life0 self,
ctx: &'life1 PolicyContext,
) -> Pin<Box<dyn Future<Output = PolicyDecision> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn on_pending_request<'life0, 'life1, 'async_trait>(
&'life0 self,
ctx: &'life1 PolicyContext,
) -> Pin<Box<dyn Future<Output = PolicyDecision> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Source§fn prepare_switch<'life0, 'life1, 'async_trait>(
&'life0 self,
ctx: &'life1 mut SwitchContext,
) -> Pin<Box<dyn Future<Output = ()> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn prepare_switch<'life0, 'life1, 'async_trait>(
&'life0 self,
ctx: &'life1 mut SwitchContext,
) -> Pin<Box<dyn Future<Output = ()> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Source§fn eviction_policy(&self) -> EvictionPolicy
fn eviction_policy(&self) -> EvictionPolicy
Source§fn request_timeout(&self) -> Duration
fn request_timeout(&self) -> Duration
Source§fn min_active_duration(&self) -> Duration
fn min_active_duration(&self) -> Duration
Source§fn scheduler_interval(&self) -> Option<Duration>
fn scheduler_interval(&self) -> Option<Duration>
Some(interval), the switcher will spawn a background scheduler
that calls [schedule_tick] every interval.Source§fn schedule_tick(&self, ctx: &ScheduleContext) -> Option<String>
fn schedule_tick(&self, ctx: &ScheduleContext) -> Option<String>
None to stay on the current model.