Skip to main content

Module low_bit

Module low_bit 

Source
Expand description

Device-executed low-bit helpers for packed/static BitNet-style paths.

Structs§

PackedRhoInt8BlockDeviceTensors

Functions§

cached_wgpu_packed_dot_decoder_tail_support
cached_wgpu_packed_dot_lowrank_support
diagnose_wgpu_packed_dot_decoder_tail
diagnose_wgpu_packed_dot_lowrank_projection
diagnose_wgpu_quantize_pack_activation_i8x4
pack_decoder_input_codes_i8x4
pack_decoder_weight_codes_i8x4
pack_lowrank_input_codes_i8x4
pack_lowrank_weight_codes_i8x4
pack_rho_int8_block_device_reference
packed_decoder_tail_device_reference
packed_decoder_tail_grad_input_device_reference
packed_decoder_tail_grad_weight_device_reference
packed_lowrank_grad_input_device_reference
packed_lowrank_grad_weight_device_reference
packed_lowrank_projection_device_reference
supports_packed_low_bit_device_backend
supports_packed_rho_int8_block_device_backend
try_cube_fused_packed_decoder_tail_wgpu
try_cube_fused_packed_lowrank_projection_wgpu
try_fused_packed_decoder_tail
try_fused_packed_decoder_tail_grad_input
try_fused_packed_decoder_tail_grad_weight
try_fused_packed_decoder_tail_training_autodiff
try_fused_packed_lowrank_grad_input
try_fused_packed_lowrank_grad_weight
try_fused_packed_lowrank_projection
try_fused_packed_lowrank_training_autodiff
try_fused_packed_lowrank_training_autodiff_cuda_device_projection_scale
try_raw_cuda_packed_decoder_tail
try_raw_cuda_packed_decoder_tail_device_scale
try_raw_cuda_packed_decoder_tail_grad_input
try_raw_cuda_packed_decoder_tail_grad_weight
try_raw_cuda_packed_decoder_tail_prepacked_input
try_raw_cuda_packed_decoder_tail_prepacked_input_device_scale
try_raw_cuda_packed_lowrank_grad_input
try_raw_cuda_packed_lowrank_grad_weight
try_raw_cuda_packed_lowrank_projection
try_raw_cuda_packed_lowrank_projection_device_scale
try_raw_cuda_packed_lowrank_projection_prepacked_input
try_raw_cuda_packed_lowrank_projection_prepacked_input_device_scale
try_raw_cuda_quantize_pack_activation_i8x4
try_wgpu_packed_dot_decoder_tail
try_wgpu_packed_dot_decoder_tail_device_scale
try_wgpu_packed_dot_decoder_tail_prepacked_input_device_scale
try_wgpu_packed_dot_lowrank_projection
try_wgpu_packed_dot_lowrank_projection_device_scale
try_wgpu_packed_dot_lowrank_projection_from_f32_device_scale
try_wgpu_packed_dot_lowrank_projection_prepacked_input_device_scale
try_wgpu_quantize_activation_codes_i32
try_wgpu_quantize_pack_activation_i8x4
unpack_rho_int8_block_device_reference