Skip to main content

Crate oxibonsai_core

Crate oxibonsai_core 

Source
Expand description

§oxibonsai-core

GGUF Q1_0_g128 format parser, tensor types, and model configuration for OxiBonsai — the Pure Rust 1-bit LLM inference engine.

This crate provides the foundational data types and parsing logic used by the rest of the OxiBonsai stack:

  • GGUF v3 binary format parsing — header, metadata key-value store, and tensor info directory (see gguf).
  • Q1_0_g128 block type — the 18-byte packed representation used for 1-bit weights (see tensor::BlockQ1_0G128).
  • Memory-mapped tensor loading — zero-copy access to weight data from disk via memmap2.
  • Model configurationconfig::Qwen3Config extracted from GGUF metadata or constructed for known Bonsai variants (8B, 4B, 1.7B).

§GGUF Q1_0_g128 Format

Each block is 18 bytes: 2-byte FP16 scale + 16 bytes (128 sign bits). Weight = bit ? +scale : -scale. Effective 1.125 bits per weight.

§Crate Organisation

ModulePurpose
configQwen3Config with named constructors for each variant
ggufLow-level GGUF v3 reader (header, metadata, tensors)
quant_ternaryBlockTQ2_0_g128, BlockTQ2_0, TernaryCode — ternary block types
tensorBlockQ1_0G128 and OneBitTensor types
errorBonsaiError / BonsaiResult

Re-exports§

pub use config::Qwen3Config;
pub use error::BonsaiError;
pub use error::BonsaiResult;
pub use gguf::compat::build_compat_report;
pub use gguf::compat::check_gguf_header;
pub use gguf::compat::CompatError;
pub use gguf::compat::ExtendedQuantType;
pub use gguf::compat::GgufCompatReport;
pub use gguf::compat::GgufVersion;
pub use gguf::header::GgufHeader;
pub use gguf::metadata::MetadataStore;
pub use gguf::metadata::MetadataValue;
pub use gguf::model_card::keys as model_card_keys;
pub use gguf::model_card::extract_known_fields;
pub use gguf::model_card::extract_model_card;
pub use gguf::model_card::ModelCard;
pub use gguf::streaming::GgufStreamParser;
pub use gguf::streaming::GgufValue;
pub use gguf::streaming::StreamState;
pub use gguf::streaming::StreamedGguf;
pub use gguf::streaming::StreamedTensorInfo;
pub use gguf::tensor_info::TensorInfo;
pub use gguf::tensor_info::TensorStore;
pub use gguf::types::GgufTensorType;
pub use gguf::types::GgufValueType;
pub use gguf::writer::MetadataWriteValue;
pub use gguf::writer::GgufWriter;
pub use gguf::writer::TensorEntry;
pub use gguf::writer::TensorType;
pub use gguf::writer::WriteError;
pub use quant_fp8::fp8_e4m3_decode;
pub use quant_fp8::fp8_e4m3_encode;
pub use quant_fp8::fp8_e5m2_decode;
pub use quant_fp8::fp8_e5m2_encode;
pub use quant_fp8::BlockFP8E4M3;
pub use quant_fp8::BlockFP8E5M2;
pub use quant_fp8::BLOCK_FP8_BYTES;
pub use quant_fp8::FP8_E4M3_MAX;
pub use quant_fp8::FP8_E5M2_MAX;
pub use quant_fp8::QK_FP8;
pub use quant_k::BlockQ2K;
pub use quant_k::BlockQ3K;
pub use quant_k::BlockQ4K;
pub use quant_k::BlockQ8K;
pub use quant_k::BLOCK_Q2_K_BYTES;
pub use quant_k::BLOCK_Q3K_BYTES;
pub use quant_k::BLOCK_Q4_K_BYTES;
pub use quant_k::BLOCK_Q8K_BYTES;
pub use quant_k_ext::BlockQ5K;
pub use quant_k_ext::BlockQ6K;
pub use quant_k_ext::BLOCK_Q5K_BYTES;
pub use quant_k_ext::BLOCK_Q6K_BYTES;
pub use quant_std::BlockQ4_0;
pub use quant_std::BlockQ8_0;
pub use quant_std::BLOCK_Q4_0_BYTES;
pub use quant_std::BLOCK_Q8_0_BYTES;
pub use quant_std::QK_Q4_0;
pub use quant_std::QK_Q8_0;
pub use quant_ternary::BlockTQ2_0;
pub use quant_ternary::BlockTQ2_0_g128;
pub use quant_ternary::TernaryCode;
pub use quant_ternary::BLOCK_TQ2_0_BYTES;
pub use quant_ternary::BLOCK_TQ2_0_G128_BYTES;
pub use quant_ternary::QK_TQ2_0;
pub use quant_ternary::QK_TQ2_0_G128;
pub use tensor::BlockQ1_0G128;
pub use tensor::OneBitTensor;

Modules§

config
Qwen3 model configuration extracted from GGUF metadata.
error
Error types for OxiBonsai core operations.
gguf
GGUF v3 binary format parser.
quant_fp8
FP8 quantization block types: E4M3FN and E5M2.
quant_k
K-quant block types for Q2_K, Q3_K, Q4_K, and Q8_K quantization formats.
quant_k_ext
K-quant block types for Q5_K and Q6_K quantization formats.
quant_std
Standard GGUF quantization block types: Q4_0 (4-bit) and Q8_0 (8-bit).
quant_ternary
Ternary quantization block types for TQ2_0_g128 and TQ2_0 formats.
tensor
Q1_0_g128 tensor types and 1-bit data access.