Skip to main content

Crate oxibonsai

Crate oxibonsai 

Source
Expand description

§OxiBonsai

Pure Rust 1-bit LLM inference engine for PrismML Bonsai models.

OxiBonsai is a high-performance inference engine designed for 1-bit quantized large language models in GGUF format. It provides a complete pipeline from model loading through token generation, with optional RAG, tokenization, evaluation, and HTTP serving capabilities.

§Quick Start

use oxibonsai::core::GgufStreamParser;

// Parse a GGUF model file via the streaming parser
let _parser = GgufStreamParser::new();

§Crate Organization

CrateDescription
oxibonsai-coreGGUF loader, tensor types, quantization, configuration
oxibonsai-kernelsOptimized compute kernels (SIMD, matmul, softmax)
oxibonsai-modelTransformer model definitions, KV cache, attention
oxibonsai-runtimeInference engine, sampling, speculative decoding
oxibonsai-tokenizerHuggingFace tokenizer integration
oxibonsai-ragRetrieval-augmented generation pipeline
oxibonsai-evalModel evaluation and benchmarking
oxibonsai-serveOpenAI-compatible HTTP server

§Feature Flags

FeatureDescription
serverHTTP server support via oxibonsai-serve
ragRetrieval-augmented generation
native-tokenizerHuggingFace tokenizer support
evalModel evaluation framework
fullEnable all optional features
simd-avx2AVX2 SIMD kernels (x86_64)
simd-avx512AVX-512 SIMD kernels (x86_64)
simd-neonNEON SIMD kernels (AArch64)
wasmWebAssembly target support

§License

Apache-2.0 — COOLJAPAN OU

Re-exports§

pub use oxibonsai_core as core;
pub use oxibonsai_kernels as kernels;
pub use oxibonsai_model as model;
pub use oxibonsai_runtime as runtime;