Crate oxibonsai

Expand description

§OxiBonsai

Pure Rust 1-bit LLM inference engine for PrismML Bonsai models.

OxiBonsai is a high-performance inference engine designed for 1-bit quantized large language models in GGUF format. It provides a complete pipeline from model loading through token generation, with optional RAG, tokenization, evaluation, and HTTP serving capabilities.

§Quick Start

use oxibonsai::core::GgufStreamParser;

// Parse a GGUF model file via the streaming parser
let _parser = GgufStreamParser::new();

§Crate Organization

Crate	Description
`oxibonsai-core`	GGUF loader, tensor types, quantization, configuration
`oxibonsai-kernels`	Optimized compute kernels (SIMD, matmul, softmax)
`oxibonsai-model`	Transformer model definitions, KV cache, attention
`oxibonsai-runtime`	Inference engine, sampling, speculative decoding
`oxibonsai-tokenizer`	HuggingFace tokenizer integration
`oxibonsai-rag`	Retrieval-augmented generation pipeline
`oxibonsai-eval`	Model evaluation and benchmarking
`oxibonsai-serve`	OpenAI-compatible HTTP server

§Feature Flags

Feature	Description
`server`	HTTP server support via `oxibonsai-serve`
`rag`	Retrieval-augmented generation
`native-tokenizer`	HuggingFace tokenizer support
`eval`	Model evaluation framework
`full`	Enable all optional features
`simd-avx2`	AVX2 SIMD kernels (x86_64)
`simd-avx512`	AVX-512 SIMD kernels (x86_64)
`simd-neon`	NEON SIMD kernels (AArch64)
`wasm`	WebAssembly target support

§License

Apache-2.0 — COOLJAPAN OU

Re-exports§

pub use oxibonsai_core as core;
pub use oxibonsai_kernels as kernels;
pub use oxibonsai_model as model;
pub use oxibonsai_runtime as runtime;

Crate oxibonsai

Crate oxibonsai Copy item path

§OxiBonsai

§Quick Start

§Crate Organization

§Feature Flags

§License

Re-exports§

Crate oxibonsai