pub struct KvInt8;Expand description
INT8 KV cache — half the memory of FP16 with per-token / per-channel scale factors. CUDA path planned via vLLM’s quant_kv kernels.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for KvInt8
impl RefUnwindSafe for KvInt8
impl Send for KvInt8
impl Sync for KvInt8
impl Unpin for KvInt8
impl UnsafeUnpin for KvInt8
impl UnwindSafe for KvInt8
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more