Expand description
§oxibonsai-core
GGUF Q1_0_g128 format parser, tensor types, and model configuration for OxiBonsai — the Pure Rust 1-bit LLM inference engine.
This crate provides the foundational data types and parsing logic used by the rest of the OxiBonsai stack:
- GGUF v3 binary format parsing — header, metadata key-value store,
and tensor info directory (see
gguf). - Q1_0_g128 block type — the 18-byte packed representation used for
1-bit weights (see
tensor::BlockQ1_0G128). - Memory-mapped tensor loading — zero-copy access to weight data
from disk via
memmap2. - Model configuration —
config::Qwen3Configextracted from GGUF metadata or constructed for known Bonsai variants (8B, 4B, 1.7B).
§GGUF Q1_0_g128 Format
Each block is 18 bytes: 2-byte FP16 scale + 16 bytes (128 sign bits). Weight = bit ? +scale : -scale. Effective 1.125 bits per weight.
§Crate Organisation
| Module | Purpose |
|---|---|
config | Qwen3Config with named constructors for each variant |
gguf | Low-level GGUF v3 reader (header, metadata, tensors) |
quant_ternary | BlockTQ2_0_g128, BlockTQ2_0, TernaryCode — ternary block types |
tensor | BlockQ1_0G128 and OneBitTensor types |
error | BonsaiError / BonsaiResult |
Re-exports§
pub use config::Qwen3Config;pub use error::BonsaiError;pub use error::BonsaiResult;pub use gguf::compat::build_compat_report;pub use gguf::compat::check_gguf_header;pub use gguf::compat::CompatError;pub use gguf::compat::ExtendedQuantType;pub use gguf::compat::GgufCompatReport;pub use gguf::compat::GgufVersion;pub use gguf::header::GgufHeader;pub use gguf::metadata::MetadataStore;pub use gguf::metadata::MetadataValue;pub use gguf::model_card::keys as model_card_keys;pub use gguf::model_card::extract_known_fields;pub use gguf::model_card::extract_model_card;pub use gguf::model_card::ModelCard;pub use gguf::streaming::GgufStreamParser;pub use gguf::streaming::GgufValue;pub use gguf::streaming::StreamState;pub use gguf::streaming::StreamedGguf;pub use gguf::streaming::StreamedTensorInfo;pub use gguf::tensor_info::TensorInfo;pub use gguf::tensor_info::TensorStore;pub use gguf::types::GgufTensorType;pub use gguf::types::GgufValueType;pub use gguf::writer::MetadataWriteValue;pub use gguf::writer::GgufWriter;pub use gguf::writer::TensorEntry;pub use gguf::writer::TensorType;pub use gguf::writer::WriteError;pub use quant_fp8::fp8_e4m3_decode;pub use quant_fp8::fp8_e4m3_encode;pub use quant_fp8::fp8_e5m2_decode;pub use quant_fp8::fp8_e5m2_encode;pub use quant_fp8::BlockFP8E4M3;pub use quant_fp8::BlockFP8E5M2;pub use quant_fp8::BLOCK_FP8_BYTES;pub use quant_fp8::FP8_E4M3_MAX;pub use quant_fp8::FP8_E5M2_MAX;pub use quant_fp8::QK_FP8;pub use quant_k::BlockQ2K;pub use quant_k::BlockQ3K;pub use quant_k::BlockQ4K;pub use quant_k::BlockQ8K;pub use quant_k::BLOCK_Q2_K_BYTES;pub use quant_k::BLOCK_Q3K_BYTES;pub use quant_k::BLOCK_Q4_K_BYTES;pub use quant_k::BLOCK_Q8K_BYTES;pub use quant_k_ext::BlockQ5K;pub use quant_k_ext::BlockQ6K;pub use quant_k_ext::BLOCK_Q5K_BYTES;pub use quant_k_ext::BLOCK_Q6K_BYTES;pub use quant_std::BlockQ4_0;pub use quant_std::BlockQ8_0;pub use quant_std::BLOCK_Q4_0_BYTES;pub use quant_std::BLOCK_Q8_0_BYTES;pub use quant_std::QK_Q4_0;pub use quant_std::QK_Q8_0;pub use quant_ternary::BlockTQ2_0;pub use quant_ternary::BlockTQ2_0_g128;pub use quant_ternary::TernaryCode;pub use quant_ternary::BLOCK_TQ2_0_BYTES;pub use quant_ternary::BLOCK_TQ2_0_G128_BYTES;pub use quant_ternary::QK_TQ2_0;pub use quant_ternary::QK_TQ2_0_G128;pub use tensor::BlockQ1_0G128;pub use tensor::OneBitTensor;
Modules§
- config
- Qwen3 model configuration extracted from GGUF metadata.
- error
- Error types for OxiBonsai core operations.
- gguf
- GGUF v3 binary format parser.
- quant_
fp8 - FP8 quantization block types: E4M3FN and E5M2.
- quant_k
- K-quant block types for Q2_K, Q3_K, Q4_K, and Q8_K quantization formats.
- quant_
k_ ext - K-quant block types for Q5_K and Q6_K quantization formats.
- quant_
std - Standard GGUF quantization block types: Q4_0 (4-bit) and Q8_0 (8-bit).
- quant_
ternary - Ternary quantization block types for TQ2_0_g128 and TQ2_0 formats.
- tensor
- Q1_0_g128 tensor types and 1-bit data access.