Expand description
Device-executed low-bit helpers for packed/static BitNet-style paths.
Structs§
Functions§
- cached_
wgpu_ packed_ dot_ decoder_ tail_ support - cached_
wgpu_ packed_ dot_ lowrank_ support - diagnose_
wgpu_ packed_ dot_ decoder_ tail - diagnose_
wgpu_ packed_ dot_ lowrank_ projection - diagnose_
wgpu_ quantize_ pack_ activation_ i8x4 - pack_
decoder_ input_ codes_ i8x4 - pack_
decoder_ weight_ codes_ i8x4 - pack_
lowrank_ input_ codes_ i8x4 - pack_
lowrank_ weight_ codes_ i8x4 - pack_
rho_ int8_ block_ device_ reference - packed_
decoder_ tail_ device_ reference - packed_
decoder_ tail_ grad_ input_ device_ reference - packed_
decoder_ tail_ grad_ weight_ device_ reference - packed_
lowrank_ grad_ input_ device_ reference - packed_
lowrank_ grad_ weight_ device_ reference - packed_
lowrank_ projection_ device_ reference - supports_
packed_ low_ bit_ device_ backend - supports_
packed_ rho_ int8_ block_ device_ backend - try_
cube_ fused_ packed_ decoder_ tail_ wgpu - try_
cube_ fused_ packed_ lowrank_ projection_ wgpu - try_
fused_ packed_ decoder_ tail - try_
fused_ packed_ decoder_ tail_ grad_ input - try_
fused_ packed_ decoder_ tail_ grad_ weight - try_
fused_ packed_ decoder_ tail_ training_ autodiff - try_
fused_ packed_ lowrank_ grad_ input - try_
fused_ packed_ lowrank_ grad_ weight - try_
fused_ packed_ lowrank_ projection - try_
fused_ packed_ lowrank_ training_ autodiff - try_
fused_ packed_ lowrank_ training_ autodiff_ cuda_ device_ projection_ scale - try_
raw_ cuda_ packed_ decoder_ tail - try_
raw_ cuda_ packed_ decoder_ tail_ device_ scale - try_
raw_ cuda_ packed_ decoder_ tail_ grad_ input - try_
raw_ cuda_ packed_ decoder_ tail_ grad_ weight - try_
raw_ cuda_ packed_ decoder_ tail_ prepacked_ input - try_
raw_ cuda_ packed_ decoder_ tail_ prepacked_ input_ device_ scale - try_
raw_ cuda_ packed_ lowrank_ grad_ input - try_
raw_ cuda_ packed_ lowrank_ grad_ weight - try_
raw_ cuda_ packed_ lowrank_ projection - try_
raw_ cuda_ packed_ lowrank_ projection_ device_ scale - try_
raw_ cuda_ packed_ lowrank_ projection_ prepacked_ input - try_
raw_ cuda_ packed_ lowrank_ projection_ prepacked_ input_ device_ scale - try_
raw_ cuda_ quantize_ pack_ activation_ i8x4 - try_
wgpu_ packed_ dot_ decoder_ tail - try_
wgpu_ packed_ dot_ decoder_ tail_ device_ scale - try_
wgpu_ packed_ dot_ decoder_ tail_ prepacked_ input_ device_ scale - try_
wgpu_ packed_ dot_ lowrank_ projection - try_
wgpu_ packed_ dot_ lowrank_ projection_ device_ scale - try_
wgpu_ packed_ dot_ lowrank_ projection_ from_ f32_ device_ scale - try_
wgpu_ packed_ dot_ lowrank_ projection_ prepacked_ input_ device_ scale - try_
wgpu_ quantize_ activation_ codes_ i32 - try_
wgpu_ quantize_ pack_ activation_ i8x4 - unpack_
rho_ int8_ block_ device_ reference