1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
//! # mih-rs
//!
//! Rust implementation of multi-index hashing (MIH) for neighbor searches on binary codes in the Hamming space, described in the paper
//!
//! > Norouzi, Punjani, and Fleet, [Fast exact search in Hamming space with multi-index hashing](https://arxiv.org/abs/1307.2982), *IEEE TPAMI*, 36(6):1107– 1119, 2014.
//!
//! ## Features
//!
//! - **Two types of neighbor searches:** `mih-rs` provides the two search operations:
//!   - *Range search* finds neighbor codes whose Hamming distances to a given code are within a radius.
//!   - *Top-K search* finds the top-K codes that are closest to a given code.
//!
//! - **Fast and memory-efficient implementation:** The data structure is built on sparse hash tables, following the [original implementation](https://github.com/norouzi/mih).
//!
//! - **Parameter free:** `mih-rs` automatically sets an optimal parameter of MIH depending on a given database (although you can also set this manually).
//!
//! - **Serialization:** `mih-rs` supports to serialize/deserialize the index.
//!
//! ## Example
//!
//! ```rust
//! use mih_rs::Index;
//!
//! // Database of codes
//! let codes: Vec<u64> = vec![
//!     0b1111111111111111111111011111111111111111111111111011101111111111, // #zeros = 3
//!     0b1111111111111111111111111111111101111111111011111111111111111111, // #zeros = 2
//!     0b1111111011011101111111111111111101111111111111111111111111111111, // #zeros = 4
//!     0b1111111111111101111111111111111111111000111111111110001111111110, // #zeros = 8
//!     0b1101111111111111111111111111111111111111111111111111111111111111, // #zeros = 1
//!     0b1111111111111111101111111011111111111111111101001110111111111111, // #zeros = 6
//!     0b1111111111111111111111111111111111101111111111111111011111111111, // #zeros = 2
//!     0b1110110101011011011111111111111101111111111111111000011111111111, // #zeros = 11
//! ];
//!
//! // Query code
//! let qcode: u64 = 0b1111111111111111111111111111111111111111111111111111111111111111; // #zeros = 0
//!
//! // Construct the index
//! let index = Index::new(codes).unwrap();
//!
//! // Find the ids of neighbor codes whose Hamming distances are within 2
//! let mut searcher = index.range_searcher();
//! let answers = searcher.run(qcode, 2);
//! assert_eq!(answers, vec![1, 4, 6]);
//!
//! // Find the ids of the top-4 nearest neighbor codes
//! let mut searcher = index.topk_searcher();
//! let answers = searcher.run(qcode, 4);
//! assert_eq!(answers, vec![4, 1, 6, 0]);
//!
//! // Serialization/Deserialization
//! let mut data = vec![];
//! index.serialize_into(&mut data).unwrap();
//! let other = Index::<u64>::deserialize_from(&data[..]).unwrap();
//! assert_eq!(index, other);
//! ```
//!
//! ## Binary code types
//!
//! `mih_rs::Index` can be built from a vector of type `mih_rs::CodeInt`
//! that is a primitive integer trait supporting a popcount operation.
//! Currently, this library defines `mih_rs::CodeInt` for `u8`, `u16`, `u32`, and `u64`.

/// An implementation of multi-index hashing.
pub mod index;

/// Exhaustive search functions (for benchmark).
pub mod ls;

/// A generic trait of supported binary codes.
pub mod codeint;

pub use codeint::CodeInt;
pub use index::Index;

/// Gets the Hamming distance between two binary codes.
pub fn hamdist<T: CodeInt>(x: T, y: T) -> usize {
    (x ^ y).popcnt() as usize
}