bit_parallelism/lib.rs
1//! # Word Level Parallelism
2//!
3//! Bit level algorithms for developing useful when developing specialized integer data structures such as
4//! the [x-fast trie](http://web.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/15/Small15.pdf)
5//!
6//! ## Overview
7//!
8//! Arithmetic and logical operations take, for all intents and purposes, constant time.
9//! Such operations operate on whole words. A word is the size of a single memory segment.
10//! This crate, assumes a word size width of `64`. For a more in-depth discussion of computer memory,
11//! refer to [this note](<https://akkadia.org/drepper/cpumemory.pdf>). For instance, it takes constant
12//! time to add two `64` bit numbers.
13//!
14//! The central idea behind the algorithms in this library is this:
15//!
16//! * If you have a bunch of small integers — each smaller that sixty four bits, e.g. a bunch of bytes,
17//! we can pack many of them into a single sixty four bit integer.
18//! * We can then operate on that packed integer as if it were a single number. For example,
19//! we can fit 8 byte sized numbers in a single word.
20//! * By operating on the packed integer, we are in effect operating on 8 different integers in parallel.
21//!
22//! This is what is called `world level parallelism`.
23//!
24//! ## Algorithms
25//!
26//! The Algorithms implemented include:
27//!
28//! ### Finding the `top(k)` bits of an integer
29//!
30//! The first procedure is quite simple. The goal is, given a number `x` and a length `k`, to extract
31//! the first `k` bits of `x` in `O(1)`. A procedure that does this will be handy when implementing the x-fast trie.
32//!
33//! ### The `SardineCan` Structure
34//!
35//! Suppose we wish to maintain a set of small sized integers in a B-Tree. And suppose too that we wish
36//! to take advantage of the fact that we can fit many of these integers in a single, larger integer.
37//! How would we go about designing a single node in such a B-Tree?
38//!
39//! Recall that a B-Tree of order `b` is a multi-way search tree in which each node is a bucket
40//! that must contain between `b - 1` and `2b - 1` keys. Furthermore, each node has one more child t
41//! han the number of keys it contains. That is, each node must have between `b` and `2b` child nodes.
42//! Operations on B-Trees rely on one key operation: `node.rank(x)`. This operation searches through
43//! the keys of a single node (which are sorted) and either returns the location of `x` in the node,
44//! or the index of the child we need to descend into in order to complete the operation at hand.
45//!
46//! In run of the mill B-Trees, `node.rank(x)` is implemented using binary search and thus takes `O(lg b)`.
47//! However, if our keys are small integers, we can perform `node.rank(x)` in `O(1)`.
48//!
49//! The `SardineCan` implements a B-Tree Node specialized for storing small integers.
50//!
51//! ### `O(1)` Most Significant Bit: FourRussiansMSB
52//!
53//! When we talk of the most significant bit of a number, we're often referring to the 0-indexed location of
54//! the highest bit set. Note that this is a more general problem than simply finding the number that would be
55//! formed if only the `msb` were set. For instance, `MSB(010010001)` is `7` and not `128`.
56//!
57//! The simplest method for finding this index in by doing a linear scan over the bits of the number in
58//! question while keeping a count of the number of bits seen thus far. This scheme runs in `O(lg n)`
59//! where `n` is the highest number our function may operate on.
60//!
61//! ```rust
62//! /// A procedure for finding the index of the most significant
63//! /// bit in time linear to the number of bits used
64//! /// to represent the value.
65//! fn get_msb_idx_of(query: u64) -> u8 {
66//! for i in (0..64).rev() {
67//! if query & (1 << i) != 0 {
68//! return i;
69//! }
70//! }
71//! panic!("MSB(0) is undefined")
72//! }
73//! ```
74//!
75//! We can improve upon the linear scanning procedure using bit level binary search.
76//! This brings down the running time to `O(lg lg n)`. Often, however, when we know that we'll
77//! be doing many `msb` queries, we use a lookup table to compute this value. Using that method,
78//! we're able to locate the index of the highest bit set in constant `O(1)` time,
79//! albeit with an added preprocessing step to build the lookup table.
80//!
81//! **We can, using bit level parallelism, locate the index of the most significant bit in constant time
82//! without using a lookup table.
83//!
84//! ## References
85//!
86//! 1. [CS 166 Lecture 15](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/15/Slides15.pdf)
87//! 2. [CS 166 Lecture 16](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/16/Slides16.pdf)
88//! 3. [CS 166 Lecture 17](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/17/Slides17.pdf)
89//! 4. [6.851](http://courses.csail.mit.edu/6.851/fall17/scribe/lec12.pdf)
90//! 5. [The Original Fusion Tree Paper](<https://reader.elsevier.com/reader/sd/pii/0022000093900404?token=1610EF62181DAC974715067B85459A4709A9BC64E39827CE0369C6C8E18540DFD1DBAD38BEE35BFF95C4C05E45A1D1D5>)
91//! 6. [This StackOverflow Question. Scroll down until you find the answer by user `templatetypedef`](<https://stackoverflow.com/questions/3878320/understanding-fusion-trees>)
92//!
93
94// Test that pointer width is compatible. This asserts that usize is 64 bits,
95// which a lot of algorithms in this crate currently assume.
96#[cfg(not(any(target_pointer_width = "64",)))]
97compile_error! {
98 "This crate requires the platform pointer to have a width of 64"
99}
100
101pub mod four_russians_msb;
102pub mod sardine_can;
103
104pub use four_russians_msb::get_msb_idx_of;
105pub use four_russians_msb::lcp_len_of;
106pub use four_russians_msb::FourRussiansMSB;
107pub use sardine_can::SardineCan;
108
109const USIZE_BITS: usize = 64;
110
111/// Given a number `x` and a length k, extract the first k bits of x in O(1).
112pub fn top_k_bits_of(x: usize, k: usize) -> usize {
113 assert!(k != 0);
114 let mut mask: usize = 1;
115
116 // Shift the 1 to the index that is `k`
117 // positions from the last index location.
118 // That is `k` away from 64
119 mask <<= USIZE_BITS - k;
120
121 // Turn that one into a zero. And all
122 // the other 63 zeros into ones.
123 mask = !mask;
124
125 // I think this is the most interesting/entertaining part.
126 // Adding a one triggers a cascade of carries that flip all
127 // the bits (all ones) before the location of the zero from above into
128 // zeros. The cascade stops when they reach the zero from
129 // above. Since it is a zero, adding a 1 does not trigger a carry
130 //
131 // In the end, we have a mask where the top k bits are ones
132 mask += 1;
133
134 // This is straightforward
135 x & mask
136}
137
138#[cfg(test)]
139mod test_bit_parallelism {
140 use pretty_assertions::assert_eq;
141 use rand::Rng;
142
143 use super::four_russians_msb;
144 use super::sardine_can;
145
146 #[test]
147 fn sardine_add() {
148 let mut rng = rand::thread_rng();
149 let mut can = sardine_can::SardineCan::default();
150 for _ in 0..8 {
151 let small_int = rng.gen_range(0..=1 << 7);
152 can.add(small_int);
153 println!("{:b}, can is {}", small_int, can)
154 }
155 //_11101110_10101110_11111000_11001101_10101111_10001101_11110111_11100001
156 //_01010110_00111110_00111110_01000011_00011011_00101111_00100011_01111010
157 //1100111
158 }
159
160 #[test]
161 fn sardine_tile() {
162 let tiled = sardine_can::SardineCan::parallel_tile_64(0b1100111);
163 println!("{:b}", tiled)
164 // 1100111_01100111_01100111_01100111
165 // 01100111_01100111_01100111_01100111_01100111_01100111_01100111_01100111
166 // 11100111_11100111_11100111_11100111_11100111_11100111_11100111_11100111
167 }
168
169 #[test]
170 fn test_stacker() {
171 // Test alternative method of computing rank
172 let a = 0b10000000_10000000_10000000_10000000_10000000_10000000_100000001u64;
173 let b = 0b10000000_00000000_10000000_10000000_00000000_10000000_00000000_10000000u64;
174 let mut c = a as u128 * b as u128;
175 println!("{:b}", c);
176 c >>= 63;
177 println!("{:b}", c);
178 println!("{}", c & 0b111);
179 }
180
181 #[test]
182 fn sardine_rank() {
183 let mut rng = rand::thread_rng();
184 let mut can = sardine_can::SardineCan::default();
185 for _ in 0..8 {
186 let small_int = rng.gen_range(0..=1 << 7);
187 can.add(small_int);
188 }
189 println!("{}", can.parallel_rank(0b1100111));
190 // _10000000_00000000_10000000_10000000_00000000_10000000_00000000_10000000
191 // 10000000_10000000_10000000_10000000_10000000_10000000_100000001
192 }
193
194 #[test]
195 fn pack() {
196 let tt = 0b00010000_10000000_10000000_10000000_10000000_00000000_00000000_00000000u64;
197 let m = 0b10000001_00000010_00000100_00001000_00010000_00100000_010000001u64;
198 let mut c = tt as u128 * m as u128;
199 println!("{:b}", c);
200 c >>= 49;
201 println!("{:b}", c);
202 if tt >> 56 == 0 {
203 c &= 0b0111_1111;
204 } else {
205 c |= 0b1000_0000;
206 c &= 0b1111_1111;
207 }
208 println!("{:b}", c);
209 // 100000_01100000_11100001_11100011_11000111_10001111_00011110_001111000
210 // 100000_10000001_01000010_11000101_11001011_10010111_00101110_01011100_01111000
211 }
212
213 #[test]
214 fn get_msb() {
215 let msb = four_russians_msb::get_msb_idx_of(873);
216 assert_eq!(9, msb);
217 let base: usize = 2;
218 let msb = four_russians_msb::get_msb_idx_of(base.pow(32) as u64);
219 assert_eq!(32, msb);
220 let msb = four_russians_msb::get_msb_idx_of(base.pow(55) as u64);
221 assert_eq!(55, msb);
222 let msb = four_russians_msb::get_msb_idx_of((base.pow(56) + 13) as u64);
223 assert_eq!(56, msb);
224 let msb = four_russians_msb::get_msb_idx_of((base.pow(61) + 31) as u64);
225 assert_eq!(61, msb);
226 let msb = four_russians_msb::get_msb_idx_of((2u128.pow(64) - 1) as u64);
227 assert_eq!(63, msb);
228 let msb = four_russians_msb::get_msb_idx_of(base.pow(48) as u64);
229 assert_eq!(48, msb);
230 let msb = four_russians_msb::get_msb_idx_of(base.pow(63) as u64);
231 assert_eq!(63, msb);
232 let msb = four_russians_msb::get_msb_idx_of(255);
233 assert_eq!(7, msb);
234 let msb = four_russians_msb::get_msb_idx_of(1);
235 assert_eq!(0, msb);
236 let msb = four_russians_msb::get_msb_idx_of(16);
237 assert_eq!(4, msb);
238 let msb = four_russians_msb::get_msb_idx_of(256);
239 assert_eq!(8, msb);
240 let msb = four_russians_msb::get_msb_idx_of(25);
241 assert_eq!(4, msb);
242 let msb = four_russians_msb::get_msb_idx_of(91);
243 assert_eq!(6, msb);
244 let msb = four_russians_msb::get_msb_idx_of(base.pow(16) as u64);
245 assert_eq!(16, msb);
246 let msb = four_russians_msb::get_msb_idx_of(1 << 18);
247 assert_eq!(18, msb);
248 }
249}