1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
//! Embedding utilities — pooling strategies, normalization, similarity,
//! and `sentence-transformers` pooling-config parsing.
//!
//! Ported from
//! [`mlx-embeddings`](https://github.com/Blaizzy/mlx-embeddings)
//! (`models/pooling.py`, `models/base.py`, `utils.py`) and
//! [`MLXEmbedders`](https://github.com/ml-explore/mlx-swift-examples)
//! (`Pooling.swift`, `MLXArray+Helper.swift`, `EmbeddingModel.swift`,
//! `EmbedderModelContainer.swift`). The pooling / normalization / similarity
//! helpers operate on the hidden states produced by an embedding model; the
//! [`EmbeddingModel`](crate::embeddings::EmbeddingModel) trait + the
//! [`encode`](crate::embeddings::encode()) entry add the orchestration
//! (tokenize → pad + mask → forward → pool → normalize). The local
//! load-factory ([`crate::embeddings::factory`] — `load` + a `model_type`
//! [`EmbeddingModelTypeRegistry`](crate::embeddings::EmbeddingModelTypeRegistry))
//! turns a local model directory into a constructed model + tokenizer +
//! pooling-config bundle. The cross-architecture ColVision seam
//! ([`crate::embeddings::colvision`] —
//! [`BaseColVisionProcessor`](crate::embeddings::BaseColVisionProcessor)
//! trait + the static [`score_single_vector`](crate::embeddings::score_single_vector) /
//! [`score_multi_vector`](crate::embeddings::score_multi_vector) helpers)
//! ships here; concrete model architectures (BERT / ColIdefics3 /
//! ColQwen2_5 / …) are added per-usecase and are out of scope (no-model-arch
//! rule); the registry is the extension point those per-usecase architectures
//! register into.
//!
//! ## Conventions
//! - `token_embeddings`: `(batch, seq_len, hidden)` float array.
//! - `attention_mask`: `(batch, seq_len)` array, `1` for real tokens, `0`
//! for padding.
//! - Pooling returns `(batch, hidden)` (except
//! [`PoolingStrategy::None`](crate::embeddings::PoolingStrategy::None),
//! which passes the `(batch, seq, hidden)` hidden states through).
//! - No implicit eval: functions compose lazily; call [`crate::Array`]
//! accessors to materialize.
//!
//! ## Surface
//! - Pooling: [`mean_pooling`](crate::embeddings::mean_pooling),
//! [`cls_pooling`](crate::embeddings::cls_pooling),
//! [`max_pooling`](crate::embeddings::max_pooling),
//! [`last_token_pooling`](crate::embeddings::last_token_pooling),
//! [`first_token_pooling`](crate::embeddings::first_token_pooling),
//! plus the unified
//! [`PoolingStrategy`](crate::embeddings::PoolingStrategy) enum +
//! [`pool`](crate::embeddings::pool) dispatcher (mirrors python
//! `pool_by_config` + swift `Pooling.callAsFunction`), plus
//! [`pool_post`](crate::embeddings::pool_post) — the shared
//! normalize/dimension/layer-norm tail applied to an already-pooled
//! vector (a model's trained `pooled_output` on the `cls`/`none`
//! paths, swift `inputs.pooledOutput ?? …`).
//! - Normalization: parameterized
//! [`normalize`](crate::embeddings::normalize()) (real
//! `mlx_linalg_norm` `ord=p`),
//! [`l2_normalize`](crate::embeddings::l2_normalize) /
//! [`l2_normalize_eps`](crate::embeddings::l2_normalize_eps)
//! convenience, eps constants
//! [`DEFAULT_NORMALIZE_EPS`](crate::embeddings::DEFAULT_NORMALIZE_EPS)
//! (python `1e-9`) and
//! [`SWIFT_L2_EPS`](crate::embeddings::SWIFT_L2_EPS) (`1e-12`).
//! - Fused post-pool norms (mlx-c-surfaced), applied by the
//! [`pool`](crate::embeddings::pool) dispatcher to the *already-pooled*
//! sentence vector (after the pooling reduction, before matryoshka
//! truncation / L2-normalize — matching swift `Pooling`'s
//! `applyLayerNorm` on the pooled output, *not* the model's internal
//! token-level normalization, which is per-architecture and out of
//! scope): [`layer_norm`](crate::embeddings::layer_norm)
//! (`mlx_fast_layer_norm`),
//! [`rms_norm`](crate::embeddings::rms_norm) (`mlx_fast_rms_norm`).
//! - ST-config parsing:
//! [`pooling_from_st_config_str`](crate::embeddings::pooling_from_st_config_str) /
//! [`pooling_from_st_config_bytes`](crate::embeddings::pooling_from_st_config_bytes) /
//! [`pooling_from_st_config_path`](crate::embeddings::pooling_from_st_config_path)
//! → [`StPoolingConfig`](crate::embeddings::StPoolingConfig).
//! - Similarity:
//! [`cosine_similarity`](crate::embeddings::cosine_similarity),
//! [`cosine_similarity_matrix`](crate::embeddings::cosine_similarity_matrix).
//! - ColVision base processor seam (mlx-embeddings
//! `colvision_processor.py`):
//! [`BaseColVisionProcessor`](crate::embeddings::BaseColVisionProcessor)
//! trait declaring the abstract `process_images` / `process_queries` /
//! `score` shape every concrete (per-model) ColVision processor
//! implements, plus the two static scoring helpers
//! [`score_single_vector`](crate::embeddings::score_single_vector)
//! (dot-product) and
//! [`score_multi_vector`](crate::embeddings::score_multi_vector)
//! (MaxSim / late-interaction).
//! - Orchestration: the
//! [`EmbeddingModel`](crate::embeddings::EmbeddingModel) trait +
//! [`EmbeddingModelOutput`](crate::embeddings::EmbeddingModelOutput)
//! (the forward-pass seam; python `BaseModelOutput`, swift
//! `EmbeddingModelOutput`) and the
//! [`encode`](crate::embeddings::encode()) entry +
//! [`EncodeConfig`](crate::embeddings::EncodeConfig) (tokenize → pad +
//! attention mask → `forward` → optional
//! [`pool`](crate::embeddings::pool) / post-processing; typically
//! returns pooled rank-2 embeddings `(batch, dim)`, but
//! [`PoolingStrategy::None`](crate::embeddings::PoolingStrategy::None)
//! can preserve rank-3 hidden states `(batch, seq_len, dim)`, and the
//! `pooled_output` fast-path can also return rank-2 output; mirrors
//! python `utils.generate` + swift
//! `EmbedderModelContainer.perform`).
use crate::;
/// Build a `(1,)` scalar constant carrying `value`, in the **same dtype**
/// as `like` (`like.dtype()`).
///
/// This is the crate's uniform stand-in for MLX *weak-scalar* / python
/// `astype(x.dtype)` semantics: every constant or `-inf`/`eps`/`0` floor
/// that meets the embedding tensor must adopt the embedding's dtype so a
/// f16/bf16 input is **not** silently promoted to f32 (dtype fidelity).
/// `mlx-embeddings` does exactly this
/// (`mask.astype(token_embeddings.dtype)`, and python scalars `-inf` /
/// `eps` / `1e-9` are MLX weak scalars that adopt the array dtype).
///
/// Implemented as f32 scalar → [`astype`] to `like.dtype()`. For a f32
/// `like` this is a dtype-preserving no-op cast, so f32 results are
/// **bit-identical** to a direct `Array::full::<f32>` (regression-safe).
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;