Skip to main content

Crate oxillama

Crate oxillama 

Source
Expand description

§OxiLLaMa

Pure Rust LLM inference engine — the sovereign alternative to llama.cpp.

This is the unified meta crate that re-exports the full OxiLLaMa API surface. Each subcrate is available as a top-level module:

ModuleDescription
ggufGGUF v3 parser and tensor loader
quantQuantization kernels (25 formats, SIMD)
archModel architectures (8 models)
runtimeInference engine, KV cache, sampling
serverOpenAI-compatible HTTP API (feature: server)
benchBenchmark suite (feature: bench)
gpuwgpu GPU backend (feature: gpu)

§Quick Start

use oxillama::runtime::{InferenceEngine, EngineConfig, SamplerConfig};

let config = EngineConfig {
    model_path: "model.gguf".to_string(),
    ..Default::default()
};
let mut engine = InferenceEngine::new(config);
engine.load_model().expect("failed to load model");
engine.generate("Hello", 128, |tok| print!("{tok}")).expect("generation failed");

Re-exports§

pub use oxillama_gguf as gguf;
pub use oxillama_quant as quant;
pub use oxillama_arch as arch;
pub use oxillama_runtime as runtime;
pub use oxillama_server as server;
pub use oxillama_bench as bench;
pub use oxillama_gpu as gpu;