mih_rs/
lib.rs

1//! # mih-rs
2//!
3//! Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper
4//!
5//! > Norouzi, Punjani, and Fleet, [Fast exact search in Hamming space with multi-index hashing](https://arxiv.org/abs/1307.2982), *IEEE TPAMI*, 36(6):1107– 1119, 2014.
6//!
7//! ## Features
8//!
9//! - **Two types of neighbor searches:** `mih-rs` provides the two search operations:
10//!   - *Range search* finds neighbor codes whose Hamming distances to a given code are within a radius.
11//!   - *Top-K search* finds the top-K codes that are closest to a given code.
12//!
13//! - **Fast and memory-efficient implementation:** The data structure is built on sparse hash tables, following the [original implementation](https://github.com/norouzi/mih).
14//!
15//! - **Parameter free:** `mih-rs` automatically sets an optimal parameter of MIH depending on a given database (although you can also set this manually).
16//!
17//! - **Serialization:** `mih-rs` supports to serialize/deserialize the index.
18//!
19//! ## Example
20//!
21//! ```rust
22//! use mih_rs::Index;
23//!
24//! // Database of codes
25//! let codes: Vec<u64> = vec![
26//!     0b1111111111111111111111011111111111111111111111111011101111111111, // #zeros = 3
27//!     0b1111111111111111111111111111111101111111111011111111111111111111, // #zeros = 2
28//!     0b1111111011011101111111111111111101111111111111111111111111111111, // #zeros = 4
29//!     0b1111111111111101111111111111111111111000111111111110001111111110, // #zeros = 8
30//!     0b1101111111111111111111111111111111111111111111111111111111111111, // #zeros = 1
31//!     0b1111111111111111101111111011111111111111111101001110111111111111, // #zeros = 6
32//!     0b1111111111111111111111111111111111101111111111111111011111111111, // #zeros = 2
33//!     0b1110110101011011011111111111111101111111111111111000011111111111, // #zeros = 11
34//! ];
35//!
36//! // Query code
37//! let qcode: u64 = 0b1111111111111111111111111111111111111111111111111111111111111111; // #zeros = 0
38//!
39//! // Construct the index
40//! let index = Index::new(codes).unwrap();
41//!
42//! // Find the ids of neighbor codes whose Hamming distances are within 2
43//! let mut searcher = index.range_searcher();
44//! let answers = searcher.run(qcode, 2);
45//! assert_eq!(answers, vec![1, 4, 6]);
46//!
47//! // Find the ids of the top-4 nearest neighbor codes
48//! let mut searcher = index.topk_searcher();
49//! let answers = searcher.run(qcode, 4);
50//! assert_eq!(answers, vec![4, 1, 6, 0]);
51//!
52//! // Serialization/Deserialization
53//! let mut data = vec![];
54//! index.serialize_into(&mut data).unwrap();
55//! let other = Index::<u64>::deserialize_from(&data[..]).unwrap();
56//! assert_eq!(index, other);
57//! ```
58//!
59//! ## Binary code types
60//!
61//! `mih_rs::Index` can be built from a vector of type `mih_rs::CodeInt`
62//! that is a primitive integer trait supporting a popcount operation.
63//! Currently, this library defines `mih_rs::CodeInt` for `u8`, `u16`, `u32`, and `u64`.
64
65/// An implementation of multi-index hashing.
66pub mod index;
67
68/// Exhaustive search functions (for benchmark).
69pub mod ls;
70
71/// A generic trait of supported binary codes.
72pub mod codeint;
73
74pub use codeint::CodeInt;
75pub use index::Index;
76
77/// Gets the Hamming distance between two binary codes.
78pub fn hamdist<T: CodeInt>(x: T, y: T) -> usize {
79    (x ^ y).popcnt() as usize
80}