oxicuda-lm 0.1.0

Large language model inference primitives for OxiCUDA: BPE tokenizer, transformer layers with KV cache, GPT-2 and LLaMA architectures — pure Rust, zero CUDA SDK dependency.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
//! Complete LLM model implementations.
//!
//! | Module | Model family |
//! |--------|--------------|
//! | [`gpt`]   | GPT-2 (token+positional embedding, LayerNorm, MLP FFN, weight-tied LM head) |
//! | [`llama`] | LLaMA-2/3, Mistral (RMSNorm, GQA, RoPE, SwiGLU, independent LM head) |
//! | [`weights`] | Weight loading utilities for both families |

pub mod gpt;
pub mod llama;
pub mod weights;

pub use gpt::Gpt2Model;
pub use llama::LlamaModel;
pub use weights::{load_gpt2_block, load_llama_block};