1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The Infino Authors
//! Convenience builders for test fixtures.
//!
//! Three test contexts share these helpers:
//!
//! - **Unit tests** (`#[cfg(test)] mod tests` inside `src/`)
//! reach this module via `crate::test_helpers::...` —
//! `cfg(test)` always enables it.
//! - **Integration tests** (`tests/...`) reach it via
//! `infino::test_helpers::...` — the `test-helpers` Cargo
//! feature is auto-enabled by the `dev-dependencies` self-
//! reference in `Cargo.toml`.
//! - **Benches** (`benches/...`) reach it the same way.
//!
//! Scope: small atomic idioms that repeat across dozens of
//! test / bench fixtures (Decimal128 id construction, default
//! tokenizer, default vector config). Higher-level "build a
//! test corpus" / "build a full test superfile" stays in the
//! test files themselves — those vary too much per scenario
//! to share usefully.
//!
//! [`brute_force_bm25`] is the textbook BM25 reference impl
//! used as the FTS correctness oracle.
use Arc;
use ;
use ;
use ThreadPoolBuilder;
use crate::;
/// Build a `Decimal128Array(38, 0)` from `u64` ids.
///
/// Centralizes the verbose three-step construction that
/// every test fixture reinvents:
///
/// ```ignore
/// Decimal128Array::from(ids.into_iter().map(|v| v as i128).collect::<Vec<_>>())
/// .with_precision_and_scale(38, 0)
/// .expect("decimal128")
/// ```
/// `Field` for the primary-key id column — `Decimal128(38, 0)`,
/// non-nullable. Caller supplies the column name (typically
/// `"_id"` at the supertable layer or `"doc_id"` in lower-level
/// superfile fixtures).
/// The default tokenizer used in tests + benches:
/// `AsciiLowerTokenizer` wrapped in `Arc<dyn Tokenizer>`.
///
/// Callers passing this into `BuilderOptions::new` wrap in
/// `Some(...)` at the call site:
///
/// ```ignore
/// BuilderOptions::new(schema, "doc_id", fts_cols, vec_cols,
/// Some(default_tokenizer()));
/// ```
/// Default `VectorConfig` for test fixtures: `dim=16`,
/// `n_cent=4`, `metric=Cosine`. Caller supplies the column
/// name and `rot_seed` — the only fields tests vary.
///
/// For realistic-scale vectors (e.g. `dim=384` in benches),
/// callers construct `VectorConfig` directly with their own
/// values.
/// Single-column user schema with `title: LargeUtf8`.
///
/// Mirrors the supertable's auto-`_id` model: the supertable
/// layer prepends `_id: Decimal128(38, 0)` automatically at
/// append time, so the user-facing schema only declares the
/// payload columns. Dozens of supertable tests reconstruct
/// this exact schema; centralizing keeps the
/// supertable-auto-injects-id contract in one place.
/// Build a single-column `RecordBatch` of titles matching
/// [`schema_id_title`]. Caller supplies the title strings;
/// the rest is fixed.
/// `SupertableOptions` with the test-fixture defaults:
/// [`schema_id_title`] schema, a single FTS column `title`,
/// no vector columns, and a 1-thread rayon writer pool.
///
/// Caller chains `.with_storage(...)` / `.with_disk_cache(...)`
/// / `.with_*(...)` for whatever the specific test needs.
/// Returning the un-storage-d shape lets each test decide
/// explicitly whether to attach storage.