pub struct MemoryBudget {
pub total_system_bytes: usize,
pub available_bytes: usize,
pub model_weight_bytes: usize,
pub kv_cache_budget: usize,
pub runtime_overhead: usize,
}Expand description
Memory budget estimation for a model deployment.
Fields§
§total_system_bytes: usizeTotal system memory the user wants to allocate (bytes).
available_bytes: usizeBytes available after subtracting model weights and runtime overhead.
model_weight_bytes: usizeModel weight footprint (bytes).
kv_cache_budget: usizeBudget specifically earmarked for KV cache (bytes).
runtime_overhead: usizeEstimated runtime overhead for buffers, activations, etc. (bytes).
Implementations§
Source§impl MemoryBudget
impl MemoryBudget
Sourcepub fn estimate(
total_available_mb: usize,
model_params: usize,
bits_per_weight: f32,
) -> Self
pub fn estimate( total_available_mb: usize, model_params: usize, bits_per_weight: f32, ) -> Self
Estimate budget.
total_available_mb— how much RAM (MB) the user is willing to use.model_params— total number of parameters in the model.bits_per_weight— quantisation bits per weight (e.g. 1.125 for Q1_0).
Sourcepub fn max_context_length(
&self,
num_layers: usize,
num_heads: usize,
head_dim: usize,
) -> usize
pub fn max_context_length( &self, num_layers: usize, num_heads: usize, head_dim: usize, ) -> usize
Maximum context length that fits in the KV cache budget.
KV cache size per token = 2 (K+V) * num_layers * num_heads * head_dim * bytes_per_element. We use FP16 (2 bytes) as the default element type for estimation.
Trait Implementations§
Source§impl Clone for MemoryBudget
impl Clone for MemoryBudget
Source§fn clone(&self) -> MemoryBudget
fn clone(&self) -> MemoryBudget
Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for MemoryBudget
impl RefUnwindSafe for MemoryBudget
impl Send for MemoryBudget
impl Sync for MemoryBudget
impl Unpin for MemoryBudget
impl UnsafeUnpin for MemoryBudget
impl UnwindSafe for MemoryBudget
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more