Skip to main content

TransformerConfig

Struct TransformerConfig 

Source
pub struct TransformerConfig {
Show 15 fields pub hidden_size: usize, pub num_attention_heads: usize, pub num_kv_heads: usize, pub intermediate_size: usize, pub num_hidden_layers: usize, pub vocab_size: usize, pub max_position_embeddings: usize, pub rms_norm_eps: f32, pub rope_theta: f32, pub use_bias: bool, pub head_dim_override: Option<usize>, pub architecture: ModelArchitecture, pub hf_architecture: Option<String>, pub hf_model_type: Option<String>, pub tie_word_embeddings: bool,
}
Expand description

Configuration for transformer models

Fields§

§hidden_size: usize

Hidden dimension (embedding size)

§num_attention_heads: usize

Number of attention heads

§num_kv_heads: usize

Number of key-value heads (for grouped-query attention)

§intermediate_size: usize

Feed-forward network intermediate dimension

§num_hidden_layers: usize

Number of transformer layers

§vocab_size: usize

Vocabulary size

§max_position_embeddings: usize

Maximum sequence length

§rms_norm_eps: f32

RMS normalization epsilon

§rope_theta: f32

RoPE theta base

§use_bias: bool

Whether to use bias in linear layers

§head_dim_override: Option<usize>

Explicit per-head dimension (overrides hidden_size / num_heads). Required for Qwen3 where head_dim=128 but hidden_size/num_heads=80.

§architecture: ModelArchitecture

Architecture family: encoder (BERT/RoBERTa) or decoder (LLaMA/Qwen). Determines position encoding, normalization, activation, and pooling strategy.

§hf_architecture: Option<String>

HuggingFace architecture class name (e.g., “Qwen2ForCausalLM”, “LlamaForCausalLM”). Used for checkpoint config.json compatibility.

§hf_model_type: Option<String>

HuggingFace model type (e.g., “qwen2”, “llama”). Used for checkpoint config.json compatibility.

§tie_word_embeddings: bool

Whether to tie input/output embeddings (embed_tokens and lm_head). Qwen2: true, LLaMA: false.

Implementations§

Source§

impl TransformerConfig

Source

pub fn llama2_7b() -> Self

LLaMA 2 7B configuration

Source

pub fn llama2_13b() -> Self

LLaMA 2 13B configuration

Source

pub fn mistral_7b() -> Self

Mistral 7B configuration

Source

pub fn qwen2_0_5b() -> Self

Qwen2 0.5B configuration (good for testing).

Empirically verified against ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-Coder-0.5B-Instruct/.../config.json 2026-05-04. Pinned by contracts/apr-pretrain-arch-polymorphic-v1.yaml FALSIFY-001.

Note: tie_word_embeddings: true is the Qwen2.5 0.5B/1.5B convention (the 7B variant turns this OFF; see qwen2_7b()). This is a Qwen scaling-law quirk — small Qwen models reuse embedding+lm_head weights to save params, but the larger variants pay the param cost for untied weights. Drift-prevention: keeping this true is required for SHIP-TWO-001 §49 MODEL-2 fine-tune from a Qwen2.5-Coder-0.5B checkpoint.

Source

pub fn qwen2_1_5b() -> Self

Qwen2.5-Coder-1.5B-Instruct: 28 layers, 12 heads, 2 KV heads, hidden=1536

Source

pub fn qwen2_7b() -> Self

Qwen2.5-Coder 7B configuration (GH-371)

Qwen2.5-Coder-7B-Instruct: 28 layers, 28 heads, 4 KV heads, hidden=3584 Contract: contracts/model-families/qwen2.yaml

Source

pub fn qwen3_4b() -> Self

Qwen3 4B configuration

Qwen3-4B: 36 layers, 32 heads, 8 KV heads, hidden=2560, head_dim=128. Same vocab_size as Qwen2 (151936). No attention bias (Qwen3 family).

Source

pub fn qwen3_5_9b() -> Self

Qwen3.5 9B configuration

Key differences from Qwen2: no attention bias, head_dim=256 (explicit), vocab_size=248320, hybrid attention (standard + linear layers). Contract: contracts/model-families/qwen3_5.yaml

Source

pub fn from_apr_metadata( hidden_size: Option<usize>, num_heads: Option<usize>, num_kv_heads: Option<usize>, intermediate_size: Option<usize>, num_layers: Option<usize>, vocab_size: Option<usize>, max_position_embeddings: Option<usize>, rms_norm_eps: Option<f32>, rope_theta: Option<f32>, architecture: Option<&str>, ) -> Option<Self>

Construct from APR v2 metadata fields.

CONTRACT: The .apr file is the single source of truth for model architecture. These fields were validated at import time by the tensor-layout-v1 contract. This function propagates that contract to the training pipeline — no hardcoded lookups, no silent fallbacks.

Returns None if any required field is missing, forcing the caller to handle the error explicitly rather than silently degrading to tiny().

GH-376: Fixes instruct pipeline ignoring .apr architecture metadata.

Source

pub fn from_size_str(size: &str) -> Result<Self, String>

Resolve config from a model size string. Errors on unknown sizes.

GH-377: Replaces _ => TransformerConfig::tiny() catch-all pattern. This is the single canonical mapping from size strings to configs. Every callsite that previously had its own match table should use this.

Source

pub fn codebert() -> Self

CodeBERT (microsoft/codebert-base) encoder configuration.

RoBERTa architecture: 12 layers, 768 hidden, 12 heads, GELU, LayerNorm, learned positions. SSC v11 Section 4: 125M params, ~20ms CPU inference, WASM-deployable.

Source

pub fn tiny() -> Self

Tiny configuration for testing

Source

pub fn is_encoder(&self) -> bool

Whether this config describes an encoder (BERT/RoBERTa) architecture.

Source

pub fn hf_architecture_name(&self) -> &str

HuggingFace architecture class name for checkpoint config.json. Uses explicit override if set, otherwise infers from config.

Source

pub fn hf_model_type_str(&self) -> &str

HuggingFace model_type string for checkpoint config.json.

Source

pub fn ties_embeddings(&self) -> bool

Whether embeddings are tied (embed_tokens == lm_head). Uses explicit flag if set, otherwise infers from architecture.

Source

pub fn head_dim(&self) -> usize

Per-head dimension.

Uses explicit override when set (Qwen3: head_dim=128 with hidden=2560, 32 heads). Falls back to hidden_size / num_heads for standard architectures.

Source

pub fn q_dim(&self) -> usize

Total Q/O projection dimension = num_heads * head_dim.

Equals hidden_size for standard architectures but differs when head_dim is explicitly overridden (e.g. Qwen3-4B: 32 * 128 = 4096 != 2560).

Source

pub fn per_layer_weight_elements(&self) -> usize

Per-layer weight VRAM in f32 elements (constant, independent of seq_len).

Maps to cuda_block.rs lines 212-220: GpuBuffer::from_host() uploads.

Source

pub fn total_training_vram_bytes(&self, max_seq_len: usize) -> usize

Total VRAM in bytes for all layers at a given max_seq_len.

Postcondition: result is exact for the current cuda_block.rs buffer layout.

Source

pub fn total_training_vram_bytes_shared(&self, max_seq_len: usize) -> usize

Total VRAM in bytes with SHARED scratch workspace (1 per model, not per layer).

This is the correct budget formula when gradient buffers are shared across layers (canonical in PyTorch/JAX). Only weights are truly per-layer.

Postcondition: result < total_training_vram_bytes(s) for L > 1

Source

pub fn max_seq_len_for_vram_shared(&self, vram_bytes: usize) -> Option<usize>

Solve for the maximum seq_len that fits in the given VRAM budget (bytes), using shared scratch workspace.

This is the solver to use with the shared-scratch architecture. Returns None if even seq_len=1 exceeds the budget.

Source

pub fn max_seq_len_for_vram(&self, vram_bytes: usize) -> Option<usize>

Solve for the maximum seq_len that fits in the given VRAM budget (bytes).

Binary search over [1, max_position_embeddings]. Returns None if even seq_len=1 exceeds the budget.

Precondition: vram_bytes > 0 Postcondition: total_training_vram_bytes(result) <= vram_bytes

Trait Implementations§

Source§

impl Clone for TransformerConfig

Source§

fn clone(&self) -> TransformerConfig

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TransformerConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for TransformerConfig

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for TransformerConfig

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> Conv for T

Source§

fn conv<T>(self) -> T
where Self: Into<T>,

Converts self into T using Into<T>. Read more
Source§

impl<T> Downcast<T> for T

Source§

fn downcast(&self) -> &T

Source§

impl<T> FmtForward for T

Source§

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

Causes self to use its Binary implementation when Debug-formatted.
Source§

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

Causes self to use its Display implementation when Debug-formatted.
Source§

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

Causes self to use its LowerExp implementation when Debug-formatted.
Source§

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

Causes self to use its LowerHex implementation when Debug-formatted.
Source§

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

Causes self to use its Octal implementation when Debug-formatted.
Source§

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

Causes self to use its Pointer implementation when Debug-formatted.
Source§

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

Causes self to use its UpperExp implementation when Debug-formatted.
Source§

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

Causes self to use its UpperHex implementation when Debug-formatted.
Source§

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

Formats each item in a sequence. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pipe for T
where T: ?Sized,

Source§

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> R
where Self: Sized,

Pipes by value. This is generally the method you want to use. Read more
Source§

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> R
where R: 'a,

Borrows self and passes that borrow into the pipe function. Read more
Source§

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> R
where R: 'a,

Mutably borrows self and passes that borrow into the pipe function. Read more
Source§

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
where Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

Borrows self, then passes self.borrow() into the pipe function. Read more
Source§

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
where Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.borrow_mut() into the pipe function. Read more
Source§

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
where Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

Borrows self, then passes self.as_ref() into the pipe function.
Source§

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
where Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.as_mut() into the pipe function.
Source§

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
where Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

Borrows self, then passes self.deref() into the pipe function.
Source§

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R, ) -> R
where Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

Mutably borrows self, then passes self.deref_mut() into the pipe function.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> Tap for T

Source§

fn tap(self, func: impl FnOnce(&Self)) -> Self

Immutable access to a value. Read more
Source§

fn tap_mut(self, func: impl FnOnce(&mut Self)) -> Self

Mutable access to a value. Read more
Source§

fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Immutable access to the Borrow<B> of a value. Read more
Source§

fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Mutable access to the BorrowMut<B> of a value. Read more
Source§

fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Immutable access to the AsRef<R> view of a value. Read more
Source§

fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Mutable access to the AsMut<R> view of a value. Read more
Source§

fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Immutable access to the Deref::Target of a value. Read more
Source§

fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Mutable access to the Deref::Target of a value. Read more
Source§

fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self

Calls .tap() only in debug builds, and is erased in release builds.
Source§

fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self

Calls .tap_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

Calls .tap_borrow() only in debug builds, and is erased in release builds.
Source§

fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

Calls .tap_borrow_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

Calls .tap_ref() only in debug builds, and is erased in release builds.
Source§

fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

Calls .tap_ref_mut() only in debug builds, and is erased in release builds.
Source§

fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

Calls .tap_deref() only in debug builds, and is erased in release builds.
Source§

fn tap_deref_mut_dbg<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

Calls .tap_deref_mut() only in debug builds, and is erased in release builds.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> TryConv for T

Source§

fn try_conv<T>(self) -> Result<T, Self::Error>
where Self: TryInto<T>,

Attempts to convert self into T using TryInto<T>. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<S, T> Upcast<T> for S
where T: UpcastFrom<S> + ?Sized, S: ?Sized,

Source§

fn upcast(&self) -> &T
where Self: ErasableGeneric, T: ErasableGeneric<Repr = Self::Repr>,

Perform a zero-cost type-safe upcast to a wider ref type within the Wasm bindgen generics type system. Read more
Source§

fn upcast_into(self) -> T
where Self: Sized + ErasableGeneric, T: ErasableGeneric<Repr = Self::Repr>,

Perform a zero-cost type-safe upcast to a wider type within the Wasm bindgen generics type system. Read more
Source§

impl<T> Upcast<T> for T

Source§

fn upcast(&self) -> Option<&T>

Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

Source§

impl<T> WasmNotSend for T
where T: Send,

Source§

impl<T> WasmNotSendSync for T

Source§

impl<T> WasmNotSync for T
where T: Sync,