Module embedding

Expand description

Embedding pooling modes and the pool_hidden_states kernel.

An embedding model produces a per-token hidden state matrix of shape [seq_len, hidden_size] (stored row-major as a flat Vec<f32>). This module provides four standard strategies to collapse that matrix into a single hidden_size-dimensional vector suitable for similarity search, retrieval, and reranking.

§Modes

Mode	Description
`Last`	Return the hidden state of the last token (default). Appropriate for causal / decoder-only models such as LLaMA.
`Mean`	Elementwise arithmetic mean across all tokens. Standard choice for BERT-style models.
`Max`	Elementwise maximum across all tokens. Captures the “most activated” feature in the sequence.
`Cls`	Return the hidden state of the first token (CLS). Used by BERT and its variants.

§Usage

use oxillama_runtime::embedding::{pool_hidden_states, PoolingMode};

// hidden is a flat [seq_len × hidden_size] matrix.
let pooled = pool_hidden_states(&hidden, seq_len, hidden_size, PoolingMode::Mean)?;

Enums§

PoolingMode: Strategy for collapsing a sequence of hidden states into a single vector.

Functions§

pool_hidden_states: Pool a sequence of per-token hidden states into a single vector.

Module embedding

Module embedding Copy item path

§Modes

§Usage

Enums§

Functions§

Module embedding