Skip to main content

set_attn_rot_disabled

Function set_attn_rot_disabled 

Source
pub fn set_attn_rot_disabled(disabled: bool)
Expand description

Control the TurboQuant attention-rotation feature globally.

When enabled (the default), llama.cpp applies a Hadamard rotation to Q/K/V tensors before storing them in the KV cache. This significantly improves quantization quality of the KV cache with near-zero overhead, as described in llama.cpp PR #21038.

This function sets or clears the LLAMA_ATTN_ROT_DISABLE environment variable, which llama.cpp reads once when a context (and its KV cache) is first created. Call it before creating any LlamaContext on the current process.

§Thread safety

Mutating environment variables while other threads may be reading them is undefined behaviour. Call this function before spawning any threads that use llama contexts, or ensure no contexts are being created concurrently.

§Example

// Disable the rotation for benchmarking purposes:
llama_cpp_4::quantize::set_attn_rot_disabled(true);

// Re-enable (default behaviour):
llama_cpp_4::quantize::set_attn_rot_disabled(false);