Expand description
§OxiLLaMa
Pure Rust LLM inference engine — the sovereign alternative to llama.cpp.
This is the unified meta crate that re-exports the full OxiLLaMa API surface. Each subcrate is available as a top-level module:
| Module | Description |
|---|---|
gguf | GGUF v3 parser and tensor loader |
quant | Quantization kernels (25 formats, SIMD) |
arch | Model architectures (8 models) |
runtime | Inference engine, KV cache, sampling |
server | OpenAI-compatible HTTP API (feature: server) |
bench | Benchmark suite (feature: bench) |
gpu | wgpu GPU backend (feature: gpu) |
§Quick Start
use oxillama::runtime::{InferenceEngine, EngineConfig, SamplerConfig};
let config = EngineConfig {
model_path: "model.gguf".to_string(),
..Default::default()
};
let mut engine = InferenceEngine::new(config);
engine.load_model().expect("failed to load model");
engine.generate("Hello", 128, |tok| print!("{tok}")).expect("generation failed");Re-exports§
pub use oxillama_gguf as gguf;pub use oxillama_quant as quant;pub use oxillama_arch as arch;pub use oxillama_runtime as runtime;pub use oxillama_server as server;pub use oxillama_bench as bench;pub use oxillama_gpu as gpu;