Module fp8

Expand description

FP8 (float8_e4m3fn) dequantization support.

Models like Qwen3.5-27B-FP8 store most weight tensors in F8_E4M3 format with per-block scale factors (weight_scale_inv). This module provides a custom VarBuilder backend that transparently dequantizes FP8 weights at load time, allowing cake to run FP8-quantized models on any backend (CUDA, Metal, CPU).

Dequantization formula (block size 128×128): bf16_weight[i*128..(i+1)128, j128..(j+1)*128] = cast(fp8_weight[…same…]) * scale_inv[i, j]

Functions§

is_fp8_quantized: Check whether a model uses FP8 block-wise quantization by looking at its config.
load_fp8_var_builder^⚠: Create a VarBuilder that transparently dequantizes FP8 weights.

Module fp8

Module fp8 Copy item path

Functions§

Module fp8