Crate pmetal_gguf

Expand description

GGUF file format implementation.

GGUF (GGML Universal Format) is a file format for storing models for inference with GGML-based executors like llama.cpp and Ollama.

This crate provides:

Types representing the GGUF format
A reader for loading GGUF files
A writer for creating GGUF files
Dequantization routines for quantized tensors

§Example

use pmetal_gguf::{GgufContent, dequant};

// Read GGUF file
let content = GgufContent::from_file("model.gguf")?;

// Get architecture
if let Some(arch) = content.architecture() {
    println!("Model architecture: {}", arch);
}

// Read and dequantize a tensor
let mut file = std::fs::File::open("model.gguf")?;
let info = content.get_tensor_info("token_embd.weight").unwrap();
let data = content.read_tensor_data(&mut file, "token_embd.weight")?;
let shape: Vec<i32> = info.dimensions.iter().map(|&d| d as i32).collect();
let floats = dequant::dequantize(&data, info.dtype, &shape)?;

Re-exports§

pub use reader::GgufContent;
pub use reader::GgufReadError;
pub use reader::GgufVersion;
pub use reader::MAX_ARRAY_LENGTH;
pub use reader::MAX_METADATA_COUNT;
pub use reader::MAX_STRING_LENGTH;
pub use reader::MAX_TENSOR_COUNT;
pub use reader::MAX_TENSOR_DIMS;

Modules§

config: Generate HuggingFace-compatible config.json from GGUF metadata.
dequant: Dequantization routines for GGUF quantized tensors.
dynamic: Dynamic quantization scheduling.
imatrix: Importance Matrix (IMatrix) implementation.
iq_quants: IQ (Importance-weighted Quantization) block structures and operations.
k_quants: K-Quant (Q2K-Q8K) block structures and operations.
keys: Standard metadata keys for GGUF files.
quantize: K-Quant quantization utilities.
reader: GGUF file reader for loading quantized models.
tensors: Standard tensor names for transformer models.
vec_dot: SIMD-optimized vector dot product operations for K-quant blocks.

Structs§

GgufBuilder: Builder for creating GGUF files.
TensorInfo: Information about a tensor in the GGUF file.

Enums§

FileType: File type enumeration for quantized models.
GgmlType: GGML tensor data types.
MetadataValue: A metadata value in GGUF format.
MetadataValueType: GGUF metadata value types.
TensorSizeError: Error type for tensor size calculations.

Constants§

GGUF_DEFAULT_ALIGNMENT: Default alignment for tensor data.
GGUF_MAGIC: GGUF magic number: “GGUF” in bytes.
GGUF_VERSION: Current GGUF version.

Crate pmetal_gguf

Crate pmetal_gguf Copy item path

§Example

Re-exports§

Modules§

Structs§

Enums§

Constants§

Crate pmetal_gguf