Expand description
GGUF file format implementation.
GGUF (GGML Universal Format) is a file format for storing models for inference with GGML-based executors like llama.cpp and Ollama.
This crate provides:
- Types representing the GGUF format
- A reader for loading GGUF files
- A writer for creating GGUF files
- Dequantization routines for quantized tensors
§Example
ⓘ
use pmetal_gguf::{GgufContent, dequant};
// Read GGUF file
let content = GgufContent::from_file("model.gguf")?;
// Get architecture
if let Some(arch) = content.architecture() {
println!("Model architecture: {}", arch);
}
// Read and dequantize a tensor
let mut file = std::fs::File::open("model.gguf")?;
let info = content.get_tensor_info("token_embd.weight").unwrap();
let data = content.read_tensor_data(&mut file, "token_embd.weight")?;
let shape: Vec<i32> = info.dimensions.iter().map(|&d| d as i32).collect();
let floats = dequant::dequantize(&data, info.dtype, &shape)?;Re-exports§
pub use reader::GgufContent;pub use reader::GgufReadError;pub use reader::GgufVersion;pub use reader::MAX_ARRAY_LENGTH;pub use reader::MAX_METADATA_COUNT;pub use reader::MAX_STRING_LENGTH;pub use reader::MAX_TENSOR_COUNT;
Modules§
- config
- Generate HuggingFace-compatible config.json from GGUF metadata.
- dequant
- Dequantization routines for GGUF quantized tensors.
- dynamic
- Dynamic quantization scheduling.
- imatrix
- Importance Matrix (IMatrix) implementation.
- iq_
quants - IQ (Importance-weighted Quantization) block structures and operations.
- k_
quants - K-Quant (Q2K-Q8K) block structures and operations.
- keys
- Standard metadata keys for GGUF files.
- quantize
- K-Quant quantization utilities.
- reader
- GGUF file reader for loading quantized models.
- tensors
- Standard tensor names for transformer models.
- vec_dot
- SIMD-optimized vector dot product operations for K-quant blocks.
Structs§
- Gguf
Builder - Builder for creating GGUF files.
- Tensor
Info - Information about a tensor in the GGUF file.
Enums§
- File
Type - File type enumeration for quantized models.
- Ggml
Type - GGML tensor data types.
- Metadata
Value - A metadata value in GGUF format.
- Metadata
Value Type - GGUF metadata value types.
- Tensor
Size Error - Error type for tensor size calculations.
Constants§
- GGUF_
DEFAULT_ ALIGNMENT - Default alignment for tensor data.
- GGUF_
MAGIC - GGUF magic number: “GGUF” in bytes.
- GGUF_
VERSION - Current GGUF version (v3 with big-endian support).