1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright the Vortex contributors
//! Unified compression scheme trait and exclusion rules.
use fmt;
use Debug;
use Hash;
use Hasher;
use ArrayRef;
use Canonical;
use VortexResult;
use crateCascadingCompressor;
use crateCompressorContext;
use crateCompressionEstimate;
use crateArrayAndStats;
use crateGenerateStatsOptions;
/// Unique identifier for a compression scheme.
///
/// The only way to obtain a [`SchemeId`] is through [`SchemeExt::id()`], which is
/// auto-implemented for all [`Scheme`] types. There is no public constructor.
///
/// The only exception to this is for `ROOT_SCHEME_ID` in `compressor.rs`.
/// Selects which children of a cascading scheme a rule applies to.
/// Push rule: declared by a cascading scheme to exclude another scheme from the subtree
/// rooted at the specified children.
///
/// Use this when the declaring scheme (the ancestor) knows about the excluded scheme. For example,
/// `ZigZag` excludes `Dict` from all its children.
/// Pull rule: declared by a scheme to exclude itself when the specified ancestor is in the
/// cascade chain.
///
/// Use this when the excluded scheme (the descendant) knows about the ancestor. For example,
/// `Sequence` excludes itself when `IntDict` is an ancestor on its codes child.
// TODO(connor): Remove all default implemented methods.
/// A single compression encoding that the [`CascadingCompressor`] can select from.
///
/// The compressor evaluates every registered scheme whose [`matches`] returns `true` for a given
/// array, picks the one with the highest [`expected_compression_ratio`], and calls [`compress`] on
/// the winner.
///
/// One of the key features of the compressor in this crate is that schemes may "cascade". A
/// scheme's [`compress`] can call back into the compressor via
/// [`CascadingCompressor::compress_child`] to compress child or transformed arrays, building up
/// multiple encoding layers (e.g. frame-of-reference and then bit-packing).
///
/// # Scheme IDs
///
/// Every scheme has a globally unique name returned by [`scheme_name`]. The [`SchemeExt::id`]
/// method (auto-implemented, cannot be overridden) wraps that name in an opaque [`SchemeId`] used
/// for equality, hashing, and exclusion rules (see below).
///
/// # Cascading and children
///
/// Schemes that produce child arrays for further compression must declare [`num_children`] > 0.
/// Each child should be identified by a stable index. Cascading schemes should use
/// [`CascadingCompressor::compress_child`] to compress each child array, which handles cascade
/// level / budget tracking and context management automatically.
///
/// No scheme may appear twice in a cascade (descendant) chain (enforced by the compressor). This
/// keeps the search space a tree.
///
/// # Exclusion rules
///
/// Schemes declare exclusion rules to prevent incompatible scheme combinations in the cascade
/// chain:
///
/// - [`descendant_exclusions`] (push): "exclude scheme X from my child Y's subtree." Used when the
/// declaring scheme knows about the excluded scheme.
/// - [`ancestor_exclusions`] (pull): "exclude me if ancestor X's child Y is above me." Used when
/// the declaring scheme knows about the ancestor.
///
/// We do this because different schemes will live in different crates, and we cannot know the
/// dependency direction ahead of time.
///
/// # Implementing a scheme
///
/// [`expected_compression_ratio`] should return
/// `CompressionEstimate::Deferred(DeferredEstimate::Sample)` when a cheap heuristic is not
/// available, asking the compressor to estimate via sampling. Implementors should return an
/// immediate [`CompressionEstimate::Verdict`] when possible.
///
/// Schemes that need statistics that may be expensive to compute should override [`stats_options`]
/// to declare what they require. The compressor merges all eligible schemes' options before
/// generating stats, so each stat is always computed at most once for a given array.
///
/// [`scheme_name`]: Scheme::scheme_name
/// [`matches`]: Scheme::matches
/// [`compress`]: Scheme::compress
/// [`expected_compression_ratio`]: Scheme::expected_compression_ratio
/// [`stats_options`]: Scheme::stats_options
/// [`num_children`]: Scheme::num_children
/// [`descendant_exclusions`]: Scheme::descendant_exclusions
/// [`ancestor_exclusions`]: Scheme::ancestor_exclusions
/// Extension trait providing [`id`](SchemeExt::id) for all [`Scheme`] implementors.
///
/// This trait is automatically implemented for every type that implements [`Scheme`]. Because the
/// blanket implementation covers all types, external crates cannot override `id()`.