ordsearch 0.1.0

A data structure for efficient lower-bound lookups
Documentation

ordsearch

Crates.io Documentation Build Status

This crate provides an efficient data structure for approximate lookups in ordered collections.

More conretely, given a set A of n values, and a query value x, this library provides an efficient mechanism for finding the smallest value in A that is greater than or equal to x. In particular, this library caters to the important case where there are many such queries to the same array, A.

This library is constructed from the best solution identified in Array Layouts for Comparison-Based Searching by Paul-Virak Khuong and Pat Morin. For more information, see the paper, their website, and the C++ implementation repository.

Current implementation

At the time of writing, this implementation uses a branch-free search over an Eytzinger-arranged array with masked prefetching based on the C++ implementation written by the authors of the aforementioned paper. This is the recommended algorithm from the paper, and what the authors suggested in https://github.com/patmorin/arraylayout/issues/3#issuecomment-338472755.

Performance

The included benchmarks can be run with

$ cargo +nightly bench --features bench

These will benchmark both construction and search with different number of values, and differently sized values. search_common is likely the metric you want to pay the most attention to: it is the cost of looking up among 1024 usize values. Summarized results from my laptop (an X1 Carbon with i7-5600U @ 2.60GHz) are given below.

Compared to binary search over a sorted vector:

name                   sorted_vec:: ns/iter  this:: ns/iter  diff ns/iter   diff %  speedup
search_common          67                    35                       -32  -47.76%   x 1.91
search_few_u32         42                    21                       -21  -50.00%   x 2.00
search_few_u8          42                    21                       -21  -50.00%   x 2.00
search_many_u32        115                   98                       -17  -14.78%   x 1.17
search_many_u8         52                    70                        18   34.62%   x 0.74
construction_few_u32   3,365                 3,924                    559   16.61%   x 0.86
construction_few_u8    3,483                 4,125                    642   18.43%   x 0.84
construction_many_u32  2,338,019             2,619,590            281,571   12.04%   x 0.89
construction_many_u8   1,174,587             1,481,139            306,552   26.10%   x 0.79

Compared to a BTreeSet:

name                   btreeset:: ns/iter  this:: ns/iter  diff ns/iter   diff %  speedup
search_common          94                  35                       -59  -62.77%   x 2.69
search_few_u32         70                  21                       -49  -70.00%   x 3.33
search_few_u8          64                  21                       -43  -67.19%   x 3.05
search_many_u32        182                 98                       -84  -46.15%   x 1.86
search_many_u8         69                  70                         1    1.45%   x 0.99
construction_few_u32   9,598               3,924                 -5,674  -59.12%   x 2.45
construction_few_u8    8,410               4,125                 -4,285  -50.95%   x 2.04
construction_many_u32  10,416,444          2,619,590         -7,796,854  -74.85%   x 3.98
construction_many_u8   3,643,919           1,481,139         -2,162,780  -59.35%   x 2.46

Future work