Expand description
NOTE: This crate is generally slower than using
Vec::binary_searchover a pre-sorted vector, contrary to the claims in the referenced paper, and is mainly presented for curiosity’s sake at this point.
This crate provides a data structure for approximate lookups in ordered collections.
More concretely, given a set A of n values, and a query value x, this library provides an
efficient mechanism for finding the smallest value in A that is greater than or equal to x.
In particular, this library caters to the important case where there are many such queries to
the same array, A.
This library is constructed from the best solution identified in Array Layouts for Comparison-Based Searching by Paul-Virak Khuong and Pat Morin. For more information, see the paper, their website, and the C++ implementation repository.
§Current implementation
At the time of writing, this implementation uses a branch-free search over an Eytzinger-arranged array with masked prefetching based on the C++ implementation written by the authors of the aforementioned paper. This is the recommended algorithm from the paper, and what the authors suggested in https://github.com/patmorin/arraylayout/issues/3#issuecomment-338472755.
Note that prefetching is only enabled with the (non-default) nightly feature due to
https://github.com/aweinstock314/prefetch/issues/1. Suggestions for workarounds welcome.
§Performance
The included benchmarks can be run with
$ cargo +nightly bench --features nightlyThis will benchmark both construction and search with different number of values, and
differently sized values – look for the line that aligns closest with your data. The general
trend is that ordsearch is faster when n is smaller and T is larger as long as you
compile with
target-cpu=native and
lto=thin. The
performance gain seems to be best on Intel processors, and is smaller since the (relatively)
recent improvement to SliceExt::binary_search
performance.
Below are summarized results from an AMD ThreadRipper 2600X CPU run with:
$ rustc +nightly --version
rustc 1.28.0-nightly (e3bf634e0 2018-06-28)
$ env CARGO_INCREMENTAL=0 RUSTFLAGS='-C target-cpu=native -C lto=thin' cargo +nightly bench --features nightlyCompared to binary search over a sorted vector:
name sorted_vec ns/iter this ns/iter diff ns/iter diff % speedup
-u32::l1 49 54 5 10.20% x 0.91
+u32::l1_dup 40 35 -5 -12.50% x 1.14
-u32::l2 63 72 9 14.29% x 0.88
+u32::l2_dup 64 62 -2 -3.12% x 1.03
-u32::l3 120 273 153 127.50% x 0.44
-u32::l3_dup 117 219 102 87.18% x 0.53
+u8::l1 42 37 -5 -11.90% x 1.14
+u8::l1_dup 29 28 -1 -3.45% x 1.04
+u8::l2 43 49 6 13.95% x 0.88
-u8::l2_dup 33 35 2 6.06% x 0.94
-u8::l3 45 66 21 46.67% x 0.68
-u8::l3_dup 35 51 16 45.71% x 0.69
-usize::l1 49 54 5 10.20% x 0.91
+usize::l1_dup 38 37 -1 -2.63% x 1.03
-usize::l2 65 76 11 16.92% x 0.86
+usize::l2_dup 65 64 -1 -1.54% x 1.02
-usize::l3 141 303 162 114.89% x 0.47
-usize::l3_dup 140 274 134 95.71% x 0.51Compared to a BTreeSet:
name btreeset ns/iter this ns/iter diff ns/iter diff % speedup
+u32::l1 68 54 -14 -20.59% x 1.26
+u32::l1_dup 45 35 -10 -22.22% x 1.29
+u32::l2 88 72 -16 -18.18% x 1.22
-u32::l2_dup 61 62 1 1.64% x 0.98
+u32::l3 346 273 -73 -21.10% x 1.27
-u32::l3_dup 136 219 83 61.03% x 0.62
+u8::l1 45 37 -8 -17.78% x 1.22
+u8::l1_dup 31 28 -3 -9.68% x 1.11
-u8::l2 44 49 5 11.36% x 0.90
-u8::l2_dup 31 35 4 12.90% x 0.89
-u8::l3 43 66 23 53.49% x 0.65
-u8::l3_dup 30 51 21 70.00% x 0.59
+usize::l1 67 54 -13 -19.40% x 1.24
+usize::l1_dup 44 37 -7 -15.91% x 1.19
+usize::l2 89 76 -13 -14.61% x 1.17
-usize::l2_dup 60 64 4 6.67% x 0.94
+usize::l3 393 303 -90 -22.90% x 1.30
-usize::l3_dup 163 274 111 68.10% x 0.59§Future work
- Implement aligned operation: https://github.com/patmorin/arraylayout/blob/3f20174a2a0ab52c6f37f2ea87d087307f19b5ee/src/eytzinger_array.h#L204
-
Implement deep prefetching for large
T: https://github.com/patmorin/arraylayout/blob/3f20174a2a0ab52c6f37f2ea87d087307f19b5ee/src/eytzinger_array.h#L128
Structs§
- Iter
- Immutable iterator over elements in a
OrderedCollection - Ordered
Collection - A collection of ordered items that can efficiently satisfy queries for nearby elements.