Skip to main content

Module embedding

Module embedding 

Source
Expand description

Embedding pooling modes and the pool_hidden_states kernel.

An embedding model produces a per-token hidden state matrix of shape [seq_len, hidden_size] (stored row-major as a flat Vec<f32>). This module provides four standard strategies to collapse that matrix into a single hidden_size-dimensional vector suitable for similarity search, retrieval, and reranking.

§Modes

ModeDescription
LastReturn the hidden state of the last token (default). Appropriate for causal / decoder-only models such as LLaMA.
MeanElementwise arithmetic mean across all tokens. Standard choice for BERT-style models.
MaxElementwise maximum across all tokens. Captures the “most activated” feature in the sequence.
ClsReturn the hidden state of the first token (CLS). Used by BERT and its variants.

§Usage

use oxillama_runtime::embedding::{pool_hidden_states, PoolingMode};

// hidden is a flat [seq_len × hidden_size] matrix.
let pooled = pool_hidden_states(&hidden, seq_len, hidden_size, PoolingMode::Mean)?;

Enums§

PoolingMode
Strategy for collapsing a sequence of hidden states into a single vector.

Functions§

pool_hidden_states
Pool a sequence of per-token hidden states into a single vector.