Skip to main content

ModelConfig

inference_lab::config::model

Struct ModelConfig

pub struct ModelConfig {
    pub name: String,
    pub num_parameters: u64,
    pub num_active_parameters: Option<u64>,
    pub num_layers: u32,
    pub hidden_dim: u32,
    pub num_heads: u32,
    pub num_kv_heads: Option<u32>,
    pub max_seq_len: u32,
    pub sliding_window: Option<u32>,
    pub num_sliding_layers: Option<u32>,
    pub kv_cache_bytes_per_token: u64,
}

Fields§

§name: String

Model name

§num_parameters: u64

Total parameters in the model (all parameters, including inactive experts in MoE)

§num_active_parameters: Option<u64>

Active parameters used during inference (for MoE models with sparse activation) If not specified, defaults to num_parameters (dense models)

§num_layers: u32

Number of transformer layers

§hidden_dim: u32

Hidden dimension

§num_heads: u32

Number of attention heads

§num_kv_heads: Option<u32>

Number of KV heads (for GQA/MQA). If not specified, defaults to num_heads (MHA)

§max_seq_len: u32

Maximum sequence length supported

§sliding_window: Option<u32>

Sliding window size for sliding window attention layers (None = no sliding window) Only applies to layers marked as using sliding window attention

§num_sliding_layers: Option<u32>

Number of layers using sliding window attention (rest use full attention) If not specified, defaults to 0 (all layers use full attention)

§kv_cache_bytes_per_token: u64

KV cache size per token per layer (in bytes) For GQA: 2 * num_kv_heads * head_dim * bytes_per_param * num_layers For MHA: 2 * num_heads * head_dim * bytes_per_param * num_layers

Implementations§

impl ModelConfig

pub fn active_parameters(&self) -> u64

Get the number of active parameters (defaults to total parameters for dense models)

pub fn compute_kv_cache_size(&mut self, bytes_per_param: u32)

Calculate and set the KV cache size per token For models with sliding window attention, this calculates an average based on typical usage

pub fn with_kv_cache_size(self, bytes_per_param: u32) -> Self

Initialize with KV cache size pre-computed

pub fn kv_cache_size_for_sequence(&self, seq_len: u32) -> u64

Calculate total KV cache size for a sequence, accounting for sliding window

Trait Implementations§

impl Clone for ModelConfig

fn clone(&self) -> ModelConfig

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for ModelConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl<'de> Deserialize<'de> for ModelConfig

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

Auto Trait Implementations§

impl Freeze for ModelConfig

impl RefUnwindSafe for ModelConfig

impl Send for ModelConfig

impl Sync for ModelConfig

impl Unpin for ModelConfig

impl UnsafeUnpin for ModelConfig

impl UnwindSafe for ModelConfig

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,