Crate oxillama

Expand description

§OxiLLaMa

Pure Rust LLM inference engine — the sovereign alternative to llama.cpp.

This is the unified meta crate that re-exports the full OxiLLaMa API surface. Each subcrate is available as a top-level module:

Module	Description
`gguf`	GGUF v3 parser and tensor loader
`quant`	Quantization kernels (25 formats, SIMD)
`arch`	Model architectures (8 models)
`runtime`	Inference engine, KV cache, sampling
`server`	OpenAI-compatible HTTP API (feature: `server`)
`bench`	Benchmark suite (feature: `bench`)
`gpu`	wgpu GPU backend (feature: `gpu`)

§Quick Start

use oxillama::runtime::{InferenceEngine, EngineConfig, SamplerConfig};

let config = EngineConfig {
    model_path: "model.gguf".to_string(),
    ..Default::default()
};
let mut engine = InferenceEngine::new(config);
engine.load_model().expect("failed to load model");
engine.generate("Hello", 128, |tok| print!("{tok}")).expect("generation failed");

Re-exports§

pub use oxillama_gguf as gguf;
pub use oxillama_quant as quant;
pub use oxillama_arch as arch;
pub use oxillama_runtime as runtime;
pub use oxillama_server as server;
pub use oxillama_bench as bench;
pub use oxillama_gpu as gpu;

Crate oxillama

Crate oxillama Copy item path

§OxiLLaMa

§Quick Start

Re-exports§

Crate oxillama