oxillama
Unified meta crate that re-exports the full OxiLLaMa API surface.
Part of the OxiLLaMa workspace — a Pure Rust LLM inference engine.
Usage
Add to your Cargo.toml:
[]
= "0.1.1"
Then use any subcrate through the unified namespace:
use GgufModel;
use QuantKernel;
use ForwardPass;
use ;
Modules
| Module | Crate | Description |
|---|---|---|
gguf |
oxillama-gguf | GGUF v3 parser and tensor loader |
quant |
oxillama-quant | Quantization kernels (25 formats) |
arch |
oxillama-arch | Model architectures (20 architectures) |
runtime |
oxillama-runtime | Inference engine, KV cache, sampling |
server |
oxillama-server | OpenAI-compatible HTTP API (feature: server) |
bench |
oxillama-bench | Benchmark suite (feature: bench) |
gpu |
oxillama-gpu | wgpu GPU backend (feature: gpu) |
Documentation
- RECIPES.md — 8 task-oriented code recipes (load & generate, serve, LoRA, speculative decoding, snapshot/resume, WASM, partial-download resume, sharded model loading)
Tests
Meta-crate test suite: feature_matrix, error_types, recipes_doctest — 19 passing.
Feature Flags
| Feature | Default | Description |
|---|---|---|
server |
yes | Enable OpenAI-compatible server |
bench |
yes | Enable benchmark suite |
gpu |
no | Enable wgpu GPU backend |
simd-avx2 |
no | AVX2 SIMD kernels |
simd-avx512 |
no | AVX-512 SIMD kernels |
simd-neon |
no | ARM NEON SIMD kernels |
llama |
no | LLaMA architecture |
qwen3 |
no | Qwen3 architecture |
mistral |
no | Mistral architecture |
gemma |
no | Gemma architecture |
phi |
no | Phi architecture |
command-r |
no | Command-R architecture |
starcoder |
no | StarCoder architecture |
deepseek |
no | DeepSeek architecture |
dbrx |
yes | DBRX architecture |
grok |
yes | Grok-1 architecture |
mamba2 |
yes | Mamba-2 SSM architecture |
llava |
no | LLaVA multimodal (requires llama) |
License
Apache-2.0 — COOLJAPAN OU (Team Kitasan)