Expand description
Per-buffer-type memory usage reported by llama.cpp.
MemoryBreakdownEntry values are produced by
crate::context::LlamaContext::memory_breakdown and classify bytes into
model weights, KV / recurrent cache, and temporary compute buffers for each
ggml backend buffer type (e.g. CUDA0, Metal, Host).
§Examples
use llama_cpp_4::prelude::*;
fn main() {
let backend = LlamaBackend::init().unwrap();
let model = LlamaModel::load_from_file(&backend, "model.gguf", &LlamaModelParams::default()).unwrap();
let ctx = model.new_context(&backend, LlamaContextParams::default()).unwrap();
for entry in ctx.memory_breakdown() {
println!("{}: {} bytes total", entry.buft_name, entry.total());
}
}Structs§
- Memory
Breakdown Entry - Memory attributed to a single backend buffer type (e.g. CUDA0, Host).