Expand description
Embedding pooling modes and the pool_hidden_states kernel.
An embedding model produces a per-token hidden state matrix of shape
[seq_len, hidden_size] (stored row-major as a flat Vec<f32>). This
module provides four standard strategies to collapse that matrix into a
single hidden_size-dimensional vector suitable for similarity search,
retrieval, and reranking.
§Modes
| Mode | Description |
|---|---|
Last | Return the hidden state of the last token (default). Appropriate for causal / decoder-only models such as LLaMA. |
Mean | Elementwise arithmetic mean across all tokens. Standard choice for BERT-style models. |
Max | Elementwise maximum across all tokens. Captures the “most activated” feature in the sequence. |
Cls | Return the hidden state of the first token (CLS). Used by BERT and its variants. |
§Usage
ⓘ
use oxillama_runtime::embedding::{pool_hidden_states, PoolingMode};
// hidden is a flat [seq_len × hidden_size] matrix.
let pooled = pool_hidden_states(&hidden, seq_len, hidden_size, PoolingMode::Mean)?;Enums§
- Pooling
Mode - Strategy for collapsing a sequence of hidden states into a single vector.
Functions§
- pool_
hidden_ states - Pool a sequence of per-token hidden states into a single vector.