ordsearch
This crate provides an efficient data structure for approximate lookups in ordered collections.
More concretely, given a set A
of n
values, and a query value x
, this library provides an
efficient mechanism for finding the smallest value in A
that is greater than or equal to x
.
In particular, this library caters to the important case where there are many such queries to
the same array, A
.
This library is constructed from the best solution identified in Array Layouts for Comparison-Based Searching by Paul-Virak Khuong and Pat Morin. For more information, see the paper, their website, and the C++ implementation repository.
Current implementation
At the time of writing, this implementation uses a branch-free search over an Eytzinger-arranged array with masked prefetching based on the C++ implementation written by the authors of the aforementioned paper. This is the recommended algorithm from the paper, and what the authors suggested in https://github.com/patmorin/arraylayout/issues/3#issuecomment-338472755.
Note that prefetching is only enabled with the (non-default) nightly
feature due to
https://github.com/aweinstock314/prefetch/issues/1. Suggestions for workarounds welcome.
Performance
The included benchmarks can be run with
$ cargo +nightly bench --features nightly
This will benchmark both construction and search with different number of values, and
differently sized values -- look for the line that aligns closest with your data. The general
trend is that ordsearch
is faster when n
is smaller and T
is larger. You may also want to
compare with the pending Rust PR "Improve SliceExt::binary_search
performance".
Summarized results from my laptop (an X1 Carbon
with i7-5600U @ 2.60GHz) are given below.
Compared to binary search over a sorted vector:
name sorted_vec ns/iter this ns/iter diff ns/iter diff % speedup
+u8::l1 46 37 -9 -19.57% x 1.24
-u8::l1_dup 31 37 6 19.35% x 0.84
-u8::l2 44 58 14 31.82% x 0.76
-u8::l2_dup 31 56 25 80.65% x 0.55
-u8::l3 29 170 141 486.21% x 0.17
-u8::l3_dup 30 127 97 323.33% x 0.24
+u32::l1 66 37 -29 -43.94% x 1.78
+u32::l1_dup 41 37 -4 -9.76% x 1.11
+u32::l2 85 64 -21 -24.71% x 1.33
-u32::l2_dup 62 64 2 3.23% x 0.97
-u32::l3 180 380 200 111.11% x 0.47
-u32::l3_dup 156 381 225 144.23% x 0.41
+usize::l1 66 37 -29 -43.94% x 1.78
+usize::l1_dup 41 37 -4 -9.76% x 1.11
+usize::l2 87 67 -20 -22.99% x 1.30
-usize::l2_dup 62 77 15 24.19% x 0.81
-usize::l3 247 522 275 111.34% x 0.47
-usize::l3_dup 203 614 411 202.46% x 0.33
Compared to a BTreeSet
:
name btreeset ns/iter this ns/iter diff ns/iter diff % speedup
+u8::l1 48 37 -11 -22.92% x 1.30
-u8::l1_dup 35 37 2 5.71% x 0.95
-u8::l2 46 58 12 26.09% x 0.79
-u8::l2_dup 35 56 21 60.00% x 0.62
-u8::l3 36 170 134 372.22% x 0.21
-u8::l3_dup 34 127 93 273.53% x 0.27
+u32::l1 66 37 -29 -43.94% x 1.78
+u32::l1_dup 42 37 -5 -11.90% x 1.14
+u32::l2 91 64 -27 -29.67% x 1.42
-u32::l2_dup 60 64 4 6.67% x 0.94
-u32::l3 351 380 29 8.26% x 0.92
-u32::l3_dup 195 381 186 95.38% x 0.51
+usize::l1 66 37 -29 -43.94% x 1.78
+usize::l1_dup 42 37 -5 -11.90% x 1.14
+usize::l2 96 67 -29 -30.21% x 1.43
-usize::l2_dup 61 77 16 26.23% x 0.79
-usize::l3 441 522 81 18.37% x 0.84
-usize::l3_dup 241 614 373 154.77% x 0.39
Future work
- Implement aligned operation: https://github.com/patmorin/arraylayout/blob/3f20174a2a0ab52c6f37f2ea87d087307f19b5ee/src/eytzinger_array.h#L204
- Implement deep prefetching for large
T
: https://github.com/patmorin/arraylayout/blob/3f20174a2a0ab52c6f37f2ea87d087307f19b5ee/src/eytzinger_array.h#L128