1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
//! Hierarchical sparse bitset.
//!
//! Memory consumption does not depend on max index inserted.
//!
//! ```text
//! Level0 128bit SIMD
//! [u8;128]
//!
//! ┌ ┐
//! Level1 Vec │128bit SIMD│
//! │ [u16;128] │
//! └ ┘
//! ┌ ┐
//! Data Vec │128bit SIMD│
//! └ ┘
//! ────────────────────────────────────────────────────
//! 1 0 1 ... 1 ◀══ bit-mask
//! Level0 □ Ø □ □ ◀══ index-pointers
//! ┗━│━━━│━━━━━━━━━│┛
//! ╭──╯ ╰──────╮ ╰───────────────────╮
//! 1 0 0 1 ▽ ▽ 1 ▽
//! Level1 □ Ø Ø □ ... ... □ ...
//! ┗━│━━━━━│━━━━━━━┛ ┗━━━━━━━│━━━━━━┛ ┗━━━━━━━━━━━━━━┛
//! ╰──╮ ╰─────────────────╮ ╰───────────────╮
//! ▽ ▽ ▽
//! Data 1 0 0 0 1 ... 0 0 1 1 0 ... 0 1 0 1 0 ... ...
//! ┗━━━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━━━━┛
//! ```
//!
//! The very structure of [BitSet] acts as acceleration structure for
//! intersection operation. All operations are incredibly fast - see benchmarks.
//! (insert/contains in "traditional bitset" ballpark, intersection/union - orders of magnitude faster)
//!
//! It is multi-level structure. Last level contains actual bit-data. Each previous level
//! have bitmask, where each bit corresponds to `!is_empty` of bitblock in next level.
//!
//! In addition to "non-empty-marker" bitmasks, there is pointers(indices) to non-empty blocks in the next level.
//! In this way, only blocks with actual data allocated.
//!
//! For inter-bitset operations, for example, intersection:
//! * root level bitmasks AND-ed.
//! * resulting bitmask traversed for bits with 1.
//! * indexes of bits with 1, used for getting pointers to the next level for each bitset.
//! * repeat for next level until the data level, then for each next 1 bit in each level.
//!
//! Bitmasks allow to cutoff empty tree/hierarchy branches early for intersection operation,
//! and traverse only actual data during iteration.
//!
//! In addition to this, during the inter-bitset operation, level1 blocks of
//! bitsets are cached for faster access. Empty blocks are skipped and not added
//! to the cache container, which algorithmically speeds up bitblock computations
//! at the data level.
//! This has an observable effect on a merge operation between N non-intersecting
//! bitsets: without this optimization - the data level bitmask would be OR-ed N times;
//! with it - only once.
//!
//! # Config
//!
//! Max index [BitSet] can hold, depends on used bitblocks capacity.
//! The bigger the bitblocks - the higher [BitSet] index range.
//! The lower - the smaller memory footprint it has.
//!
//! Max index for 64bit blocks = 262_144; for 256bit blocks = 16_777_216.
//!
//! Use [BitSet] with predefined [config]:
//! ```
//! type BitSet = hi_sparse_bitset::BitSet<hi_sparse_bitset::config::_128bit>;
//! ```
//!
//! # Inter-bitset operations
//!
//! Inter-bitset operations can be applied between ANY [BitSetInterface].
//! Output of inter-bitset operations are lazy bitsets(which are [BitSetInterface]s too).
//! This means that you can combine different operations however you want
//! without ever materializing them into actual [BitSet].
//!
//! Use [reduce()] to apply inter-bitset operation between elements of bitsets iterator.
//!
//! Use [apply()] to apply inter-bitset operation between two bitsets. Also [&], [|], [`^`], [-].
//!
//! You can define your own inter-bitset operation by implementing [BitSetOp].
//!
//! [&]: std::ops::BitAnd
//! [|]: std::ops::BitOr
//! [`^`]: std::ops::BitXor
//! [-]: std::ops::Sub
//!
//! # Laziness and materialization
//!
//! Use `BitSet::from(impl BitSetInterface)` instead of collecting iterator for
//! faster materialization into BitSet.
//!
//! # Cursor
//!
//! [BitSetInterface] iterators can return [cursor()], pointing to the current iterator position.
//! You can use [Cursor] to move ANY [BitSetInterface] iterator to its position with [move_to].
//!
//! You can also build cursor from index.
//!
//! [cursor()]: crate::iter::IndexIter::cursor
//! [Cursor]: crate::iter::IndexCursor
//! [move_to]: crate::iter::IndexIter::move_to
//!
//! # Iterator::for_each
//!
//! [BitSetInterface] iterators have [for_each] specialization and stable [try_for_each] version - [traverse].
//! For tight loops, traversing is observably faster than iterating.
//!
//! [for_each]: std::iter::Iterator::for_each
//! [try_for_each]: std::iter::Iterator::try_for_each
//! [traverse]: crate::iter::IndexIter::traverse
//!
//! # TrustedHierarchy
//!
//! TrustedHierarchy means that each raised bit in hierarchy bitblock
//! is guaranteed to correspond to non-empty block.
//! That may be not true for [difference] and [symmetric difference] operation result.
//!
//! You can check if bitset has TrustedHierarchy with [BitSetBase::TRUSTED_HIERARCHY].
//!
//! Bitsets with TrustedHierarchy are faster to compare with [Eq] and
//! have O(1) [is_empty()].
//!
//! [difference]: ops::Sub
//! [symmetric difference]: ops::Xor
//! [is_empty()]: BitSetInterface::is_empty
//!
//! # DataBlocks
//!
//! You can iterate [DataBlock]s instead of individual indices. DataBlocks can be moved, cloned
//! and iterated for indices.
//!
//! # Serialization
//!
//! BitSet have fast and compact [serialization](BitSet::serialize).
//! [Deserialization](BitSet::deserialize) is the fastest way to fill BitSet,
//! since serialized data store **contiguous** hierarchy bitblocks.
//!
//! ## Serde
//!
//! Enable feature `serde` - for [serde] serialization support.
//!
//! [serde]: https://crates.io/crates/serde
//!
//! # CPU extensions
//!
//! Library uses `popcnt`/`count_ones` and `tzcnt`/`trailing_zeros` heavily.
//! Make sure you compile with hardware support of these
//! (on x86: `target-feature=+popcnt,+bmi1`).
//!
//! ## SIMD
//!
//! 128 and 256 bit configurations use SIMD, powered by [wide]. Make sure you compile with simd support
//! enabled (on x86: `sse2` for _128bit, `avx` for _256bit) to achieve the best performance.
//! _sse2 enabled by default in Rust for most desktop environments_
//!
//! If you don't need "wide" configurations, you may disable the default feature `simd`.
//!
//! [wide]: https://crates.io/crates/wide
pub use ;
pub use Apply;
pub use Reduce;
pub use BitBlock;
pub use BitSet;
pub use ;
use Primitive;
use PrimitiveArray;
use Config;
use BitSetOp;
use ReduceCache;
pub use assume;
/// Creates a lazy bitset, as [BitSetOp] application between two bitsets.
/// Creates a lazy bitset, as bitsets iterator reduction.
///
/// "Reduce" term used in Rust's [Iterator::reduce] sense.
///
/// If the `bitsets` is empty - returns `None`; otherwise - returns the resulting
/// lazy bitset.
///
/// `bitsets` iterator must be cheap to clone (slice iterator is a good example).
/// It will be cloned AT LEAST once for each returned [DataBlock] during iteration.
///
/// # Safety
///
/// Panics, if [Config::DefaultCache] capacity is smaller then sets len.
/// [reduce], using specific [cache] for iteration.
///
/// Cache applied to current operation only, so you can combine different cache
/// types.
///
/// N.B. Alternatively, you can change [Config::DefaultCache] and use [reduce].
///
/// # Safety
///
/// Panics, if `Cache` capacity is smaller then sets len.
///
/// [reduce]: reduce()