ordsearch

NOTE: This crate is generally slower than using Vec::binary_search over a pre-sorted vector, contrary to the claims in the referenced paper, and is mainly presented for curiosity's sake at this point.

This crate provides a data structure for approximate lookups in ordered collections.

More concretely, given a set A of n values, and a query value x, this library provides an efficient mechanism for finding the smallest value in A that is greater than or equal to x. In particular, this library caters to the important case where there are many such queries to the same array, A.

This library is constructed from the best solution identified in Array Layouts for Comparison-Based Searching by Paul-Virak Khuong and Pat Morin. For more information, see the paper, their website, and the C++ implementation repository.

Current implementation

At the time of writing, this implementation uses a branch-free search over an Eytzinger-arranged array with masked prefetching based on the C++ implementation written by the authors of the aforementioned paper. This is the recommended algorithm from the paper, and what the authors suggested in https://github.com/patmorin/arraylayout/issues/3#issuecomment-338472755.

Note that prefetching is only enabled with the (non-default) nightly feature due to https://github.com/aweinstock314/prefetch/issues/1. Suggestions for workarounds welcome.

Performance

The included benchmarks can be run with

$ cargo +nightly bench --features nightly

This will benchmark both construction and search with different number of values, and differently sized values -- look for the line that aligns closest with your data. The general trend is that ordsearch is faster when n is smaller and T is larger as long as you compile with target-cpu=native and lto=thin. The performance gain seems to be best on Intel processors, and is smaller since the (relatively) recent improvement to SliceExt::binary_search performance.

Below are summarized results from an AMD ThreadRipper 2600X CPU run with:

$ rustc +nightly --version
rustc 1.28.0-nightly (e3bf634e0 2018-06-28)
$ env CARGO_INCREMENTAL=0 RUSTFLAGS='-C target-cpu=native -C lto=thin' cargo +nightly bench --features nightly

Compared to binary search over a sorted vector:

 name           sorted_vec ns/iter  this ns/iter  diff ns/iter   diff %  speedup
-u32::l1        49                  54                       5   10.20%   x 0.91
+u32::l1_dup    40                  35                      -5  -12.50%   x 1.14
-u32::l2        63                  72                       9   14.29%   x 0.88
+u32::l2_dup    64                  62                      -2   -3.12%   x 1.03
-u32::l3        120                 273                    153  127.50%   x 0.44
-u32::l3_dup    117                 219                    102   87.18%   x 0.53
+u8::l1         42                  37                      -5  -11.90%   x 1.14
+u8::l1_dup     29                  28                      -1   -3.45%   x 1.04
+u8::l2         43                  49                       6   13.95%   x 0.88
-u8::l2_dup     33                  35                       2    6.06%   x 0.94
-u8::l3         45                  66                      21   46.67%   x 0.68
-u8::l3_dup     35                  51                      16   45.71%   x 0.69
-usize::l1      49                  54                       5   10.20%   x 0.91
+usize::l1_dup  38                  37                      -1   -2.63%   x 1.03
-usize::l2      65                  76                      11   16.92%   x 0.86
+usize::l2_dup  65                  64                      -1   -1.54%   x 1.02
-usize::l3      141                 303                    162  114.89%   x 0.47
-usize::l3_dup  140                 274                    134   95.71%   x 0.51

Compared to a BTreeSet:

 name           btreeset ns/iter  this ns/iter  diff ns/iter   diff %  speedup
+u32::l1        68                54                     -14  -20.59%   x 1.26
+u32::l1_dup    45                35                     -10  -22.22%   x 1.29
+u32::l2        88                72                     -16  -18.18%   x 1.22
-u32::l2_dup    61                62                       1    1.64%   x 0.98
+u32::l3        346               273                    -73  -21.10%   x 1.27
-u32::l3_dup    136               219                     83   61.03%   x 0.62
+u8::l1         45                37                      -8  -17.78%   x 1.22
+u8::l1_dup     31                28                      -3   -9.68%   x 1.11
-u8::l2         44                49                       5   11.36%   x 0.90
-u8::l2_dup     31                35                       4   12.90%   x 0.89
-u8::l3         43                66                      23   53.49%   x 0.65
-u8::l3_dup     30                51                      21   70.00%   x 0.59
+usize::l1      67                54                     -13  -19.40%   x 1.24
+usize::l1_dup  44                37                      -7  -15.91%   x 1.19
+usize::l2      89                76                     -13  -14.61%   x 1.17
-usize::l2_dup  60                64                       4    6.67%   x 0.94
+usize::l3      393               303                    -90  -22.90%   x 1.30
-usize::l3_dup  163               274                    111   68.10%   x 0.59

Future work

Implement aligned operation: https://github.com/patmorin/arraylayout/blob/3f20174a2a0ab52c6f37f2ea87d087307f19b5ee/src/eytzinger_array.h#L204
Implement deep prefetching for large T: https://github.com/patmorin/arraylayout/blob/3f20174a2a0ab52c6f37f2ea87d087307f19b5ee/src/eytzinger_array.h#L128

ordsearch 0.2.3

ordsearch

Current implementation

Performance

Future work