oxillama 0.1.2

Pure Rust LLM inference engine — the sovereign alternative to llama.cpp (meta crate)
Documentation

oxillama

Unified meta crate that re-exports the full OxiLLaMa API surface.

Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.

Usage

Add to your Cargo.toml:

[dependencies]
oxillama = "0.1.1"

Then use any subcrate through the unified namespace:

use oxillama::gguf::GgufModel;
use oxillama::quant::QuantKernel;
use oxillama::arch::ForwardPass;
use oxillama::runtime::{InferenceEngine, EngineConfig};

Modules

Module Crate Description
gguf oxillama-gguf GGUF v3 parser and tensor loader
quant oxillama-quant Quantization kernels (25 formats)
arch oxillama-arch Model architectures (20 architectures)
runtime oxillama-runtime Inference engine, KV cache, sampling
server oxillama-server OpenAI-compatible HTTP API (feature: server)
bench oxillama-bench Benchmark suite (feature: bench)
gpu oxillama-gpu wgpu GPU backend (feature: gpu)

Documentation

  • RECIPES.md — 8 task-oriented code recipes (load & generate, serve, LoRA, speculative decoding, snapshot/resume, WASM, partial-download resume, sharded model loading)

Tests

Meta-crate test suite: feature_matrix, error_types, recipes_doctest19 passing.

Feature Flags

Feature Default Description
server yes Enable OpenAI-compatible server
bench yes Enable benchmark suite
gpu no Enable wgpu GPU backend
simd-avx2 no AVX2 SIMD kernels
simd-avx512 no AVX-512 SIMD kernels
simd-neon no ARM NEON SIMD kernels
llama no LLaMA architecture
qwen3 no Qwen3 architecture
mistral no Mistral architecture
gemma no Gemma architecture
phi no Phi architecture
command-r no Command-R architecture
starcoder no StarCoder architecture
deepseek no DeepSeek architecture
dbrx yes DBRX architecture
grok yes Grok-1 architecture
mamba2 yes Mamba-2 SSM architecture
llava no LLaVA multimodal (requires llama)

License

Apache-2.0 — COOLJAPAN OU (Team Kitasan)