1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright the Vortex contributors
//! "Fat" token table layout.
//!
//! Each token is materialized into a 16-byte-strided row, so a decode load
//! addresses `data + code * 16` straight from the code — replacing the
//! `code → entry → dict[offset]` dependent-load chain of the
//! [`super::DecodeEntry`] layout with a single independent load. Costs
//! `dict_tokens * 16` bytes of table; whether that pays is a cache-residency
//! question the [`super::plan`] index decides per host.
//!
//! Loop structure: a 16-byte over-copy fast region ([`super::scalar::copy16`])
//! plus an exact, length-aware tail.
use MaybeUninit;
use scalar;
use crateParts;
use crateMAX_TOKEN_SIZE;
/// 16-byte-strided token table: row `code` holds the token bytes (zero-padded to
/// 16) and `lens[code]` the true length.
pub
/// Materialize the [`FatTable`] for a column. Built once per decode call.
///
/// Each token is over-copied into its 16-byte row with a single branchless
/// [`super::scalar::copy16`] — the same fixed [`MAX_TOKEN_SIZE`]-byte read the
/// decode loop uses, which is why the dictionary must carry trailing padding.
/// The row's bytes past the token's true length hold neighbouring dictionary
/// bytes; that is harmless because decode advances the output cursor by the true
/// length (`lens[code]`) and the over-written tail is reclaimed by the next
/// token (or by the exact decode tail).
///
/// ## Safety
///
/// `parts.dict_bytes` must extend [`MAX_TOKEN_SIZE`] bytes past the highest
/// token offset (so the fixed-width read from every offset is in bounds — at
/// most `MAX_TOKEN_SIZE - 1` bytes past the logical end), and
/// `parts.dict_offsets` must be non-decreasing with tokens ≤ [`MAX_TOKEN_SIZE`]
/// (i.e. [`Parts::validate_dictionary`] holds).
pub unsafe
/// Decode `codes` into `out`.
///
/// When `CHECK` is `true`, each code is bounds-checked against the dictionary
/// (`dict_tokens`) with a cold, predicted-never-taken branch so a malformed
/// `Parts` panics instead of reading out of bounds. The loop stays count-bound
/// and the copy stays a single store, so the guard is free in practice —
/// measured within noise of, and at small dictionaries faster than, the
/// unchecked loop (code-layout effects dominate the guard's cost). When `false`,
/// the guard compiles out — byte-identical to a bare unchecked loop.
///
/// ## Safety
///
/// `fat` must be built from the same column as `codes`, and `out` must be at
/// least the fully decoded length. With `CHECK == false`, every code must also
/// be a valid token index (`< dict_tokens`); with `CHECK == true` that is
/// enforced.
pub unsafe