Skip to main content

Module tensor_capture

Module tensor_capture 

Source
Expand description

Capture intermediate tensor outputs during decode via the cb_eval callback.

During llama_decode, llama.cpp evaluates a computation graph where each tensor node has a name (e.g. "l_out-13" for layer 13’s output, "attn_norm-5" for layer 5’s attention norm, "result_norm" for the final norm output).

The cb_eval callback is invoked for every tensor node:

  • Ask phase (ask = true): return true to request this tensor’s data.
  • Data phase (ask = false): the tensor data is computed and available to copy out via ggml_backend_tensor_get().

TensorCapture provides a safe, reusable wrapper around this mechanism.

§Example

use llama_cpp_4::context::params::LlamaContextParams;
use llama_cpp_4::context::tensor_capture::TensorCapture;

// Capture layers 13, 20, 27
let mut capture = TensorCapture::for_layers(&[13, 20, 27]);

let ctx_params = LlamaContextParams::default()
    .with_n_ctx(Some(NonZeroU32::new(2048).unwrap()))
    .with_embeddings(true)
    .with_tensor_capture(&mut capture);

let mut ctx = model.new_context(&backend, ctx_params)?;
// ... add tokens to batch ...
ctx.decode(&mut batch)?;

// Read captured hidden states
for &layer in &[13, 20, 27] {
    if let Some(info) = capture.get(layer) {
        println!("Layer {}: shape [{}, {}]", layer, info.n_embd, info.n_tokens);
        // info.data contains [n_tokens * n_embd] f32 values
        // Layout: data[token_idx * n_embd + dim_idx]
    }
}

Structs§

CapturedTensor
Information about a single captured tensor.
TensorCapture
Captures intermediate tensor outputs during llama_decode.