1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
//! Deterministic synthetic embedder.
//!
//! Turns sparse `(axis, weight)` features into a dense 128-dim vector and
//! adds seeded LCG noise for realism. This is a **reproducibility
//! utility**, not spherical geometry: it lives in `sphereql-core` so that
//! both the example corpora (`sphereql-corpus`) and the corpus self-tune
//! entry point (`sphereql-embed`'s `run_self_tune`) can synthesize the
//! same embeddings without pulling in the corpus crate's native-only
//! arrow/parquet dependencies. Given a fixed feature set and seed the
//! output is bit-for-bit identical across runs.
/// Embedding dimensionality.
pub const DIM: usize = 128;
/// Noise amplitude used by [`embed`] — the default regime for the
/// built-in 775-concept corpus.
pub const DEFAULT_NOISE_AMPLITUDE: f64 = 0.04;
/// Deterministic pseudo-random embedding from sparse features.
///
/// Fills a 128-dim vector with the given feature weights, then adds
/// low-amplitude noise (±0.02) seeded by `seed` for realism.
/// [`embed`] with a configurable noise amplitude.
///
/// Each dimension gets uniform noise in `[-amplitude/2, +amplitude/2]` from a
/// seeded LCG. Use this to synthesize stress-test corpora that bracket the
/// default regime (e.g. `amplitude=0.2` for a signal-to-noise ratio roughly
/// 10× harsher than the built-in corpus).