Skip to main content

Module memory_breakdown

Module memory_breakdown 

Source
Expand description

Per-buffer-type memory usage reported by llama.cpp.

MemoryBreakdownEntry values are produced by crate::context::LlamaContext::memory_breakdown and classify bytes into model weights, KV / recurrent cache, and temporary compute buffers for each ggml backend buffer type (e.g. CUDA0, Metal, Host).

§Examples

use llama_cpp_4::prelude::*;

fn main() {
    let backend = LlamaBackend::init().unwrap();
    let model = LlamaModel::load_from_file(&backend, "model.gguf", &LlamaModelParams::default()).unwrap();
    let ctx = model.new_context(&backend, LlamaContextParams::default()).unwrap();
    for entry in ctx.memory_breakdown() {
        println!("{}: {} bytes total", entry.buft_name, entry.total());
    }
}

Structs§

MemoryBreakdownEntry
Memory attributed to a single backend buffer type (e.g. CUDA0, Host).