mih-rs
Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper
Norouzi, Punjani, and Fleet, Fast exact search in Hamming space with multi-index hashing, IEEE TPAMI, 36(6):1107– 1119, 2014.
Features
-
Two types of neighbor searches: mih-rs provides the two search operations:
- Range search finds neighbor codes whose Hamming distances to a given code are within a radius.
- Top-K search finds the top-K codes that are closest to a given code.
-
Fast and memory-efficient implementation: The data structure is built on sparse hash tables, following the original implementation.
-
Parameter free: mih-rs automatically sets an optimal parameter of MIH depending on a given database (although you can also set this manually).
Example
use Index;
Binary code types
Index
can be built from an array of type CodeInt
that is a primitive integer trait supporting a popcount operation. Currently, this library defines CodeInt
for u8
, u16
, u32
, u64
, and u128
. That is, Index
supports neighbor searches on these binary code types.
Benchmark
timeperf_topk.rs
offers the benchmark of top-K search for MIH and LinearSearch algorithms on binary code types u32
, u64
, and u128
.
The following table shows the result of average search times in milliseconds per query, in the settings:
- Database: N random codes from a uniform distribution.
- Query set: 100 random codes from a uniform distribution.
- Machine: MacBook Pro (2019) of Quad-Core Intel Core i5 @2.4 GHz with 16 GB of RAM.
- Library version: v0.2.0
Result for u32
Algorithm | N=10,000 | N=100,000 | N=1,000,000 | N=10,000,000 |
---|---|---|---|---|
MIH (K=1) | 0.01 | 0.02 | 0.07 | 0.38 |
MIH (K=10) | 0.04 | 0.08 | 0.30 | 1.06 |
MIH (K=100) | 0.13 | 0.22 | 1.22 | 4.35 |
LinearSearch | 0.36 | 4.40 | 50.96 | 626.87 |
Result for u64
Algorithm | N=10,000 | N=100,000 | N=1,000,000 | N=10,000,000 |
---|---|---|---|---|
MIH (K=1) | 0.10 | 0.36 | 1.46 | 6.7 |
MIH (K=10) | 0.20 | 0.76 | 3.72 | 14.8 |
MIH (K=100) | 0.41 | 1.53 | 7.02 | 33.2 |
LinearSearch | 0.36 | 4.36 | 52.28 | 629.1 |
Result for u128
Algorithm | N=10,000 | N=100,000 | N=1,000,000 | N=10,000,000 |
---|---|---|---|---|
MIH (K=1) | 0.57 | 3.24 | 23.7 | 162 |
MIH (K=10) | 0.83 | 4.95 | 42.6 | 323 |
MIH (K=100) | 1.10 | 7.71 | 69.5 | 416 |
LinearSearch | 0.48 | 5.47 | 62.3 | 698 |
Licensing
This library is free software provided under MIT.