bit_parallelism/
lib.rs

1//! # Word Level Parallelism
2//!
3//! Bit level algorithms for developing useful when developing specialized integer data structures such as 
4//! the [x-fast trie](http://web.stanford.edu/class/archive/cs/cs166/cs166.1166/lectures/15/Small15.pdf)
5//!
6//! ## Overview
7//!
8//! Arithmetic and logical operations take, for all intents and purposes, constant time. 
9//! Such operations operate on whole words. A word is the size of a single memory segment. 
10//! This crate, assumes a word size width of `64`. For a more in-depth discussion of computer memory, 
11//! refer to [this note](<https://akkadia.org/drepper/cpumemory.pdf>). For instance, it takes constant 
12//! time to add two `64` bit numbers.
13//!
14//! The central idea behind the algorithms in this library  is this:
15//!
16//! * If you have a bunch of small integers — each smaller that sixty four bits, e.g. a bunch of bytes, 
17//! we can pack many of them into a single sixty four bit integer.
18//! * We can then operate on that packed integer as if it were a single number. For example, 
19//! we can fit 8 byte sized numbers in a single word.
20//! * By operating on the packed integer, we are in effect operating on 8 different integers in parallel.
21//!
22//! This is what is called `world level parallelism`.
23//!
24//! ## Algorithms
25//!
26//! The Algorithms implemented include:
27//!
28//! ### Finding the `top(k)` bits of an integer
29//!
30//! The first procedure is quite simple. The goal is, given a number `x` and a length `k`, to extract 
31//! the first `k` bits of `x` in `O(1)`. A procedure that does this will be handy when implementing the x-fast trie.
32//!
33//! ### The `SardineCan` Structure
34//!
35//! Suppose we wish to maintain a set of small sized integers in a B-Tree. And suppose too that we wish 
36//! to take advantage of the fact that we can fit many of these integers in a single, larger integer. 
37//! How would we go about designing a single node in such a B-Tree?
38//!
39//! Recall that a B-Tree of order `b` is a multi-way search tree in which each node is a bucket 
40//! that must contain between `b - 1` and `2b - 1` keys. Furthermore, each node has one more child t
41//! han the number of keys it contains. That is, each node must have between `b` and `2b` child nodes. 
42//! Operations on B-Trees rely on one key operation: `node.rank(x)`. This operation searches through 
43//! the keys of a single node (which are sorted) and either returns the location of `x` in the node, 
44//! or the index of the child we need to descend into in order to complete the operation at hand. 
45//! 
46//! In run of the mill B-Trees, `node.rank(x)` is implemented using binary search and thus takes `O(lg b)`. 
47//! However, if our keys are small integers, we can perform `node.rank(x)` in `O(1)`.
48//!
49//! The `SardineCan` implements a B-Tree Node specialized for storing small integers.
50//!
51//! ### `O(1)` Most Significant Bit: FourRussiansMSB
52//!
53//! When we talk of the most significant bit of a number, we're often referring to the 0-indexed location of 
54//! the highest bit set. Note that this is a more general problem than simply finding the number that would be 
55//! formed if only the `msb` were set. For instance, `MSB(010010001)` is `7` and not `128`.
56//!
57//! The simplest method for finding this index in by doing a linear scan over the bits of the number in 
58//! question while keeping a count of the number of bits seen thus far. This scheme runs in `O(lg n)` 
59//! where `n` is the highest number our function may operate on.
60//!
61//! ```rust
62//! /// A procedure for finding the index of the most significant
63//! /// bit in time linear to the number of bits used
64//! /// to represent the value.
65//! fn get_msb_idx_of(query: u64) -> u8 {
66//!     for i in (0..64).rev() {
67//!         if query & (1 << i) != 0 {
68//!             return i;
69//!         }
70//!     }
71//!     panic!("MSB(0) is undefined")
72//! }
73//! ```
74//!
75//! We can improve upon the linear scanning procedure using bit level binary search. 
76//! This brings down the running time to `O(lg lg n)`. Often, however, when we know that we'll 
77//! be doing many `msb` queries, we use a lookup table to compute this value. Using that method, 
78//! we're able to locate the index of the highest bit set in constant  `O(1)` time, 
79//! albeit with an added preprocessing step to build the lookup table.
80//!
81//! **We can, using bit level parallelism, locate the index of the most significant bit in constant time 
82//! without using a lookup table.
83//!
84//! ## References
85//!
86//! 1. [CS 166 Lecture 15](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/15/Slides15.pdf)
87//! 2. [CS 166 Lecture 16](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/16/Slides16.pdf)
88//! 3. [CS 166 Lecture 17](http://web.stanford.edu/class/archive/cs/cs166/cs166.1196/lectures/17/Slides17.pdf)
89//! 4. [6.851](http://courses.csail.mit.edu/6.851/fall17/scribe/lec12.pdf)
90//! 5. [The Original Fusion Tree Paper](<https://reader.elsevier.com/reader/sd/pii/0022000093900404?token=1610EF62181DAC974715067B85459A4709A9BC64E39827CE0369C6C8E18540DFD1DBAD38BEE35BFF95C4C05E45A1D1D5>)
91//! 6. [This StackOverflow Question. Scroll down until you find the answer by user `templatetypedef`](<https://stackoverflow.com/questions/3878320/understanding-fusion-trees>)
92//!
93
94// Test that pointer width is compatible. This asserts that usize is 64 bits,
95// which a lot of algorithms in this crate currently assume.
96#[cfg(not(any(target_pointer_width = "64",)))]
97compile_error! {
98    "This crate requires the platform pointer to have a width of 64"
99}
100
101pub mod four_russians_msb;
102pub mod sardine_can;
103
104pub use four_russians_msb::get_msb_idx_of;
105pub use four_russians_msb::lcp_len_of;
106pub use four_russians_msb::FourRussiansMSB;
107pub use sardine_can::SardineCan;
108
109const USIZE_BITS: usize = 64;
110
111///  Given a number `x` and a length k, extract the first k bits of x in O(1).
112pub fn top_k_bits_of(x: usize, k: usize) -> usize {
113    assert!(k != 0);
114    let mut mask: usize = 1;
115
116    // Shift the 1 to the index that is `k`
117    // positions from the last index location.
118    // That is `k` away from 64
119    mask <<= USIZE_BITS - k;
120
121    // Turn that one into a zero. And all
122    // the other 63 zeros into ones.
123    mask = !mask;
124
125    // I think this is the most interesting/entertaining part.
126    // Adding a one triggers a cascade of carries that flip all
127    // the bits (all ones) before the location of the zero from above into
128    // zeros. The cascade stops when they reach the zero from
129    // above. Since it is a zero, adding a 1 does not trigger a carry
130    //
131    // In the end, we have a mask where the top k bits are ones
132    mask += 1;
133
134    // This is straightforward
135    x & mask
136}
137
138#[cfg(test)]
139mod test_bit_parallelism {
140    use pretty_assertions::assert_eq;
141    use rand::Rng;
142
143    use super::four_russians_msb;
144    use super::sardine_can;
145
146    #[test]
147    fn sardine_add() {
148        let mut rng = rand::thread_rng();
149        let mut can = sardine_can::SardineCan::default();
150        for _ in 0..8 {
151            let small_int = rng.gen_range(0..=1 << 7);
152            can.add(small_int);
153            println!("{:b}, can is {}", small_int, can)
154        }
155        //_11101110_10101110_11111000_11001101_10101111_10001101_11110111_11100001
156        //_01010110_00111110_00111110_01000011_00011011_00101111_00100011_01111010
157        //1100111
158    }
159
160    #[test]
161    fn sardine_tile() {
162        let tiled = sardine_can::SardineCan::parallel_tile_64(0b1100111);
163        println!("{:b}", tiled)
164        // 1100111_01100111_01100111_01100111
165        // 01100111_01100111_01100111_01100111_01100111_01100111_01100111_01100111
166        // 11100111_11100111_11100111_11100111_11100111_11100111_11100111_11100111
167    }
168
169    #[test]
170    fn test_stacker() {
171        // Test alternative method of computing rank
172        let a = 0b10000000_10000000_10000000_10000000_10000000_10000000_100000001u64;
173        let b = 0b10000000_00000000_10000000_10000000_00000000_10000000_00000000_10000000u64;
174        let mut c = a as u128 * b as u128;
175        println!("{:b}", c);
176        c >>= 63;
177        println!("{:b}", c);
178        println!("{}", c & 0b111);
179    }
180
181    #[test]
182    fn sardine_rank() {
183        let mut rng = rand::thread_rng();
184        let mut can = sardine_can::SardineCan::default();
185        for _ in 0..8 {
186            let small_int = rng.gen_range(0..=1 << 7);
187            can.add(small_int);
188        }
189        println!("{}", can.parallel_rank(0b1100111));
190        // _10000000_00000000_10000000_10000000_00000000_10000000_00000000_10000000
191        // 10000000_10000000_10000000_10000000_10000000_10000000_100000001
192    }
193
194    #[test]
195    fn pack() {
196        let tt = 0b00010000_10000000_10000000_10000000_10000000_00000000_00000000_00000000u64;
197        let m = 0b10000001_00000010_00000100_00001000_00010000_00100000_010000001u64;
198        let mut c = tt as u128 * m as u128;
199        println!("{:b}", c);
200        c >>= 49;
201        println!("{:b}", c);
202        if tt >> 56 == 0 {
203            c &= 0b0111_1111;
204        } else {
205            c |= 0b1000_0000;
206            c &= 0b1111_1111;
207        }
208        println!("{:b}", c);
209        // 100000_01100000_11100001_11100011_11000111_10001111_00011110_001111000
210        // 100000_10000001_01000010_11000101_11001011_10010111_00101110_01011100_01111000
211    }
212
213    #[test]
214    fn get_msb() {
215        let msb = four_russians_msb::get_msb_idx_of(873);
216        assert_eq!(9, msb);
217        let base: usize = 2;
218        let msb = four_russians_msb::get_msb_idx_of(base.pow(32) as u64);
219        assert_eq!(32, msb);
220        let msb = four_russians_msb::get_msb_idx_of(base.pow(55) as u64);
221        assert_eq!(55, msb);
222        let msb = four_russians_msb::get_msb_idx_of((base.pow(56) + 13) as u64);
223        assert_eq!(56, msb);
224        let msb = four_russians_msb::get_msb_idx_of((base.pow(61) + 31) as u64);
225        assert_eq!(61, msb);
226        let msb = four_russians_msb::get_msb_idx_of((2u128.pow(64) - 1) as u64);
227        assert_eq!(63, msb);
228        let msb = four_russians_msb::get_msb_idx_of(base.pow(48) as u64);
229        assert_eq!(48, msb);
230        let msb = four_russians_msb::get_msb_idx_of(base.pow(63) as u64);
231        assert_eq!(63, msb);
232        let msb = four_russians_msb::get_msb_idx_of(255);
233        assert_eq!(7, msb);
234        let msb = four_russians_msb::get_msb_idx_of(1);
235        assert_eq!(0, msb);
236        let msb = four_russians_msb::get_msb_idx_of(16);
237        assert_eq!(4, msb);
238        let msb = four_russians_msb::get_msb_idx_of(256);
239        assert_eq!(8, msb);
240        let msb = four_russians_msb::get_msb_idx_of(25);
241        assert_eq!(4, msb);
242        let msb = four_russians_msb::get_msb_idx_of(91);
243        assert_eq!(6, msb);
244        let msb = four_russians_msb::get_msb_idx_of(base.pow(16) as u64);
245        assert_eq!(16, msb);
246        let msb = four_russians_msb::get_msb_idx_of(1 << 18);
247        assert_eq!(18, msb);
248    }
249}