rullama 0.3.0

Browser-resident Gemma 4 inference: pure Rust → WebAssembly + WebGPU. Loads Ollama's on-disk GGUF blobs and runs the forward pass on the local GPU via hand-written WGSL.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
//! Pure-Rust f32 forward pass for Gemma 4. The parity oracle for our WGSL kernels.
//!
//! Performance is irrelevant here — correctness against the Ollama Go reference
//! implementation (`/Users/nightness/Source/ollama/model/models/gemma4/model_text.go`)
//! is the only thing that matters.
//!
//! Built only when the `cpu-reference` cargo feature is enabled, to keep WASM bundle
//! size small.

pub mod forward;
pub mod forward_chained;
pub mod forward_gpu;
pub mod ops;
pub mod weights;

pub use forward::{KvState, LayerKv, forward_token};
pub use forward_gpu::forward_token_gpu;
pub use weights::Weights;