realizar 0.8.4

Pure Rust ML inference engine built from scratch - model serving for GGUF and safetensors
1
2
3
4
5
6
7
8
9
10
11
//! Attention computation for OwnedQuantizedModel
//!
//! Contains apply_rope (rotary position embeddings), causal_attention,
//! and KV-cache based attention variants with GQA support.

// RealizarError used by transitively included attention_gqa.rs
#[allow(unused_imports)]
use crate::error::{RealizarError, Result};
use crate::gguf::{GGUFConfig, OwnedQuantizedModel};

include!("flash_attention_dispatch.rs");