Skip to main content

QuantScheme

Enum QuantScheme 

Source
pub enum QuantScheme {
Show 14 variants Int8Block { block_size: u32, }, Int8BlockAsym { block_size: u32, }, Int4Block { block_size: u32, }, Fp8E4m3, Fp8E5m2, GgufQ4K, GgufQ5K, GgufQ6K, GgufQ8K, GgufQ2K, GgufQ3K, GgufQ4_0, GgufQ8_0, Nvfp4Block,
}
Expand description

How a tensor is quantized. Mirrors the schemes RLX needs for LLM inference on Apple Silicon: blockwise int8 (GPTQ-style), blockwise int4 (Q4_K), and per-tensor fp8 (e4m3 / e5m2).

Each variant carries the parameters the dequantizer needs to read at runtime — scale, zero-point, block size. Where these live in the actual weight tensor is up to the loader (#56).

Variants§

§

Int8Block

Symmetric int8 with one scale per block_size elements.

Fields

§block_size: u32
§

Int8BlockAsym

Asymmetric int8 with scale + zero-point per block_size elements.

Fields

§block_size: u32
§

Int4Block

Int4 packed two-per-byte, scale per block_size elements (Q4_K-ish; matches GGUF block layout).

Fields

§block_size: u32
§

Fp8E4m3

FP8 e4m3 (no scale; same domain as half).

§

Fp8E5m2

FP8 e5m2 (no scale; wider range than e4m3).

§

GgufQ4K

GGUF / llama.cpp Q4_K super-block (256 elements / 144 bytes). Packs an f16 super-scale + f16 super-min + 8 sub-block 6-bit scales + 8 sub-block 6-bit mins + 128 nibbles. Block layout is fixed by the format — there’s no block_size knob.

§

GgufQ5K

GGUF Q5_K (256 / 176 bytes). Adds a 32-byte high-bit plane on top of Q4_K.

§

GgufQ6K

GGUF Q6_K (256 / 210 bytes). Per-sub-block signed scales, no min term.

§

GgufQ8K

GGUF Q8_K (256 / 276 bytes). Per-super-block f32 scale plus i8 quants and a 32-byte sum-of-blocks table that’s only used by Q8_K × Q8_K matmul accumulation paths.

§

GgufQ2K

GGUF Q2_K (256 / 84 bytes). 2-bit quants with per-sub-block scale/min.

§

GgufQ3K

GGUF Q3_K (256 / 110 bytes). 3-bit quants with hmask high bit plane.

§

GgufQ4_0

GGUF Q4_0 (32 / 18 bytes). Legacy llama.cpp block: f16 scale + nibbles.

§

GgufQ8_0

GGUF Q8_0 (32 / 34 bytes). Legacy block: f16 scale + 32×i8 quants.

§

Nvfp4Block

NVIDIA FP4 (E2M1) block — fixed 16-element groups, FP8 E4M3 block scales, optional f32 global scale on input 3 (legacy zp slot). Used by FLUX.2 / MLX nvfp4 checkpoints.

Implementations§

Source§

impl QuantScheme

Source

pub const fn bits_per_element_x10(self) -> u32

Bits per element after packing (×10 for K-quants since they have fractional bit budgets — divide by 10 when comparing).

Source

pub const fn bits_per_element(self) -> u32

Bits per element after packing (rounded down). Use bits_per_element_x10 for the K-quant fractional values.

Source

pub const fn has_scale(self) -> bool

True if this scheme requires a per-block scale tensor on the side.

Source

pub const fn scale_is_fp8(self) -> bool

True for NVFP4 block scales stored as FP8 E4M3 bytes (not f32).

Source

pub const fn nvfp4_group_size(self) -> u32

Fixed NVFP4 group size along K (0 for other schemes).

Source

pub const fn has_zero_point(self) -> bool

True if this scheme requires a per-block zero-point.

Source

pub const fn gguf_block_size(self) -> u32

GGUF K-quant block size (256 elements) — meaningless for the non-GGUF schemes (returns 0).

Source

pub const fn gguf_block_bytes(self) -> u32

Bytes per GGUF super-block. 0 for non-GGUF schemes.

Source

pub const fn is_gguf(self) -> bool

True for any GGUF-format block scheme. GGUF schemes carry their scales / mins / sub-block metadata inside the packed weight bytes — they don’t need separate scale / zp tensors fed alongside as the legacy Int8Block paths do.

Trait Implementations§

Source§

impl Clone for QuantScheme

Source§

fn clone(&self) -> QuantScheme

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for QuantScheme

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for QuantScheme

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for QuantScheme

Source§

fn eq(&self, other: &QuantScheme) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for QuantScheme

Source§

impl StructuralPartialEq for QuantScheme

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.