pub enum Dtype {
F32,
F16,
U32,
I32,
I8,
}Variants§
F32
32-bit IEEE float. Default activation / weight dtype on the CPU path; fallback for backends without F16 hw support.
F16
16-bit IEEE half. Hot-path dtype on CUDA + Metal (decode q, K/V, GEMM outputs).
U32
32-bit unsigned integer. Block tables, context lens, sorted
token ids, args buffers — anything previously tunneled through
an FP buffer via alloc_u32 / write_u32.
I32
32-bit signed integer. Expert ids, position offsets,
cu_seqlens_q, tpe (tokens-per-expert). Same byte width as
U32; separate variant so kernel signatures
(device const int* vs device const uint*) can stay
type-honest at runtime.
I8
8-bit signed integer. INT8 quantized KV cache cells. Used by
KvCacheQuant<B, KvInt8>’s paged stores.
Implementations§
Trait Implementations§
impl Copy for Dtype
impl Eq for Dtype
impl StructuralPartialEq for Dtype
Auto Trait Implementations§
impl Freeze for Dtype
impl RefUnwindSafe for Dtype
impl Send for Dtype
impl Sync for Dtype
impl Unpin for Dtype
impl UnsafeUnpin for Dtype
impl UnwindSafe for Dtype
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more