Expand description
Capture intermediate tensor outputs during decode via the cb_eval callback.
During llama_decode, llama.cpp evaluates a computation graph where each
tensor node has a name (e.g. "l_out-13" for layer 13’s output,
"attn_norm-5" for layer 5’s attention norm, "result_norm" for the
final norm output).
The cb_eval callback is invoked for every tensor node:
- Ask phase (
ask = true): returntrueto request this tensor’s data. - Data phase (
ask = false): the tensor data is computed and available to copy out viaggml_backend_tensor_get().
TensorCapture provides a safe, reusable wrapper around this mechanism.
§Example
ⓘ
use llama_cpp_4::context::params::LlamaContextParams;
use llama_cpp_4::context::tensor_capture::TensorCapture;
// Capture layers 13, 20, 27
let mut capture = TensorCapture::for_layers(&[13, 20, 27]);
let ctx_params = LlamaContextParams::default()
.with_n_ctx(Some(NonZeroU32::new(2048).unwrap()))
.with_embeddings(true)
.with_tensor_capture(&mut capture);
let mut ctx = model.new_context(&backend, ctx_params)?;
// ... add tokens to batch ...
ctx.decode(&mut batch)?;
// Read captured hidden states
for &layer in &[13, 20, 27] {
if let Some(info) = capture.get(layer) {
println!("Layer {}: shape [{}, {}]", layer, info.n_embd, info.n_tokens);
// info.data contains [n_tokens * n_embd] f32 values
// Layout: data[token_idx * n_embd + dim_idx]
}
}Structs§
- Captured
Tensor - Information about a single captured tensor.
- Tensor
Capture - Captures intermediate tensor outputs during
llama_decode.