Expand description
Memory estimation and parameter fitting from llama.cpp common/fit.
get_device_memory_data— project per-device memory for a parameter set without keeping a context alive.fit_params— adjustLlamaModelParams/LlamaContextParamsto fit available device memory (upstreamcommon_fit_params).
§Example — memory estimate
use llama_cpp_4::prelude::*;
use std::path::Path;
fn main() {
let _backend = LlamaBackend::init().unwrap();
let report = get_device_memory_data(
Path::new("model.gguf"),
&LlamaModelParams::default().with_n_gpu_layers(99),
&LlamaContextParams::default(),
llama_cpp_sys_4::GGML_LOG_LEVEL_ERROR,
)
.unwrap();
println!("training ctx: {}", report.hyperparams.n_ctx_train);
for (i, entry) in report.entries.iter().enumerate() {
println!(
"device {i}: {} bytes free / {} total (projected {})",
entry.free,
entry.total,
entry.used(),
);
}
}§Example — auto-fit parameters
use llama_cpp_4::fit::{fit_params, FitParams};
use llama_cpp_4::prelude::*;
use std::path::Path;
fn main() {
let backend = LlamaBackend::init().unwrap();
let result = fit_params(
&backend,
Path::new("model.gguf"),
FitParams::default().with_n_ctx_min(512),
)
.unwrap();
use std::num::NonZeroU32;
println!("n_ctx: {}", result.context_params.n_ctx().map_or(0, NonZeroU32::get));
println!("n_gpu_layers: {}", result.model_params.n_gpu_layers());
}Structs§
- Device
Memory Entry - Per-device memory projection from
get_device_memory_data. - Device
Memory Hyper Params - Hyper-parameters discovered while estimating device memory.
- Device
Memory Report - Result of
get_device_memory_data. - FitParams
- Input to
fit_params. - FitParams
Result - Fitted model/context parameters plus auxiliary buffers.
Enums§
- Device
Memory Error - Errors from
get_device_memory_data. - FitParams
Error - Errors from
fit_params.
Functions§
- fit_
params - Adjust model and context parameters to fit available device memory.
- get_
device_ memory_ data - Estimate per-device memory for a model path and parameter set.