Expand description
§OxiBonsai
Pure Rust 1-bit LLM inference engine for PrismML Bonsai models.
OxiBonsai is a high-performance inference engine designed for 1-bit quantized large language models in GGUF format. It provides a complete pipeline from model loading through token generation, with optional RAG, tokenization, evaluation, and HTTP serving capabilities.
§Quick Start
use oxibonsai::core::GgufStreamParser;
// Parse a GGUF model file via the streaming parser
let _parser = GgufStreamParser::new();§Crate Organization
| Crate | Description |
|---|---|
oxibonsai-core | GGUF loader, tensor types, quantization, configuration |
oxibonsai-kernels | Optimized compute kernels (SIMD, matmul, softmax) |
oxibonsai-model | Transformer model definitions, KV cache, attention |
oxibonsai-runtime | Inference engine, sampling, speculative decoding |
oxibonsai-tokenizer | HuggingFace tokenizer integration |
oxibonsai-rag | Retrieval-augmented generation pipeline |
oxibonsai-eval | Model evaluation and benchmarking |
oxibonsai-serve | OpenAI-compatible HTTP server |
§Feature Flags
| Feature | Description |
|---|---|
server | HTTP server support via oxibonsai-serve |
rag | Retrieval-augmented generation |
native-tokenizer | HuggingFace tokenizer support |
eval | Model evaluation framework |
full | Enable all optional features |
simd-avx2 | AVX2 SIMD kernels (x86_64) |
simd-avx512 | AVX-512 SIMD kernels (x86_64) |
simd-neon | NEON SIMD kernels (AArch64) |
wasm | WebAssembly target support |
§License
Apache-2.0 — COOLJAPAN OU
Re-exports§
pub use oxibonsai_core as core;pub use oxibonsai_kernels as kernels;pub use oxibonsai_model as model;pub use oxibonsai_runtime as runtime;