1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
//! bloom_filter_simple is a library that offers different implementations of a data
//! structure for filtering elements. The data structure is based on the ideas presented by Burton
//! Howard Bloom and is therefore known as bloom filter:
//! > Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun.
//! ACM 13, 7 (July 1970), 422–426. DOI: [https://doi.org/10.1145/362686.362692](https://doi.org/10.1145/362686.362692)
//!
//! # Overview
//! Basic description taken from [Wikipedia](https://en.wikipedia.org/wiki/Bloom_filter):
//!
//! > "A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard
//! Bloom in 1970, that is used to test whether an element is a member of a set. False positive
//! matches are possible, but false negatives are not – in other words, a query returns either
//! "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed
//! (though this can be addressed with the counting Bloom filter variant); the more items added, the
//! larger the probability of false positives." ("Bloom filter". Definition, para. 1. In Wikipedia.
//! Retrieved December 02, 2020, from https://en.wikipedia.org/wiki/Bloom_filter)
//!
//! # Bloom Filter Implementations
//! The library offers two basic types of bloom filter implementations.
//!
//! ## Kirsch-Mitzenmacher Bloom Filter (KMBloomFilter)
//! This type of bloom filter uses two hashers to simulate an arbitrary number of additional hash functions.
//!
//! The implementation is based on the work of [Kirsch and Mitzenmacher](https://doi.org/10.1007/11841036_42) \[1\].
//! In their work, they demonstrated that it is possible to apply simulated hash functions in a bloom
//! filter effectively, i.e., without loss in the asymptotic false positive probability.
//! Given two hash functions *h_1(x)* and *h_2(x)*, an *i*-th additional hash function *g_i(x)* can be
//! simulated as *g_i(x) = h_1(x) + i* \* *h_2(x)*.
//!
//! > \[1\] Kirsch A., Mitzenmacher M. (2006) Less Hashing, Same Performance: Building a Better Bloom Filter.
//! In: Azar Y., Erlebach T. (eds) Algorithms – ESA 2006. ESA 2006. Lecture Notes in Computer Science, vol 4168.
//! Springer, Berlin, Heidelberg. https://doi.org/10.1007/11841036_42
//!
//! ## Seeded Bloom Filter (SeededBloomFilter)
//! A bloom filter that uses a single Hasher that can be seeded to simulate an arbitrary number of hash functions.
//! Internally, the implementation uses [ahash::AHasher](https://crates.io/crates/ahash).
//!
//! # Examples
//! In the following, you can find simple examples of how to initialize and use the different bloom filter types.
//!
//! ## Default Bloom Filter
//! The crate offers a default type for a KMBloomFilter that uses *ahash::AHasher* and Rust's
//! *std::collections::hash_map::DefaultHasher* to simulate more hash functions. We compared
//! different hash functions for use by KMBloomFilter, and this combination yielded the best results
//! with respect to the filter's false positive probability.
//!
//! We recommend using DefaultBloomFilter for quickly getting started.
//! ```
//! use bloom_filter_simple::{BloomFilter,DefaultBloomFilter};
//!
//! fn main() {
//! // We plan on storing at most 10,000 elements
//! let desired_capacity = 10_000;
//! // The chance of a false positive increases with each inserted element.
//! // This parameter specifies that the chance should be less than 0.01% (0.0001)
//! // when the desired capacity has been reached. In other words, the chance
//! // that the bloom filter returns true when checking whether a novel element
//! // has been inserted before is less than 0.01% (0.0001).
//! let desired_fp_probability = 0.0001;
//!
//! let mut filter = DefaultBloomFilter::new(desired_capacity, desired_fp_probability);
//!
//! // You can insert any type implementing the Hash trait. The bloom filter does
//! // not store the inserted elements but only their hashes. Hence, there is no
//! // transfer of ownership required.
//! filter.insert(&5i32);
//! filter.insert(&"Some text");
//! filter.insert(&10_000usize);
//!
//! // You can check whether a value has been inserted into the filter before.
//! assert_eq!(false, filter.contains(&3));
//! assert_eq!(true, filter.contains(&5));
//! assert_eq!(true, filter.contains(&"Some text"));
//! }
//! ```
//!
//! ## KMBloomFilter
//! Initialization and application of a KMBloomFilter.
//! ```
//! use bloom_filter_simple::{BloomFilter,KMBloomFilter};
//! use ahash::AHasher;
//! use std::collections::hash_map::DefaultHasher;
//!
//! fn main() {
//! // We plan on storing at most 10,000 elements
//! let desired_capacity = 10_000;
//! // We want to assure that the chance of a false positive is less than 0.01% (0.0001)
//! // for up to desired_capacity elements.
//! let desired_fp_probability = 0.0001;
//!
//! // We initialize a new KMBloomFilter by specifying the desired Hashers as type
//! // parameters. It is possible to use any type that implements Hasher + Default.
//! // Default is required to receive a new instance of a hasher after a value was
//! // hashed, because the Hasher trait does not provide an interface for resetting
//! // a hasher implementing it. This is required to receive the same hash value
//! // when inserting or checking the same element multiple times.
//! let mut filter: KMBloomFilter<AHasher, DefaultHasher> = KMBloomFilter::new(
//! desired_capacity,
//! desired_fp_probability
//! );
//!
//! // You can insert any type implementing the Hash trait. The bloom filter does not
//! // store the inserted elements but only their hashes. Hence, there is no transfer
//! // of ownership required.
//! filter.insert(&5i32);
//! filter.insert(&"Some text");
//! filter.insert(&10_000usize);
//!
//! // You can check whether a value has been inserted into the filter before.
//! assert_eq!(false, filter.contains(&3));
//! assert_eq!(true, filter.contains(&5));
//! assert_eq!(true, filter.contains(&"Some text"));
//! }
//! ```
//!
//! ## SeededBloomFilter
//! Initialization and application of a SeededBloomFilter.
//! ```
//! use bloom_filter_simple::{BloomFilter,SeededBloomFilter};
//!
//! fn main() {
//! // We plan on storing at most 10,000 elements
//! let desired_capacity = 10_000;
//! // We want to assure that the chance of a false positive is less than 0.0001
//! // for up to desired_capacity elements.
//! let desired_fp_probability = 0.0001;
//!
//! // A SeededBloomFilter uses a single seeded ahash::AHasher internally.
//! let mut filter = SeededBloomFilter::new(desired_capacity, desired_fp_probability);
//!
//! // You can insert any type implementing the Hash trait. The bloom filter does
//! // not store the inserted elements but only their hashes. Hence, there is no
//! // transfer of ownership required.
//! filter.insert(&5i32);
//! filter.insert(&"Some text");
//! filter.insert(&10_000usize);
//!
//! // You can check whether a value has been inserted into the filter before.
//! assert_eq!(false, filter.contains(&3));
//! assert_eq!(true, filter.contains(&5));
//! assert_eq!(true, filter.contains(&"Some text"));
//! }
//! ```
use ;
pub use KMBloomFilter;
pub use SeededBloomFilter;
/**
A default implementation of KMBloomFilter using ahash::AHasher and collections::hash_map::DefaultHasher.
DefaultBloomFilter is implemented as a type definition `type DefaultBloomFilter = KMBloomFilter<ahash::AHasher, DefaultHasher>;`
# Examples
```
use bloom_filter_simple::{DefaultBloomFilter,BloomFilter};
fn simple_bloom_filter_test() {
let desired_capacity = 1_000_000;
let false_positive_probability = 0.0001;
let mut bloom_filter = DefaultBloomFilter::new(desired_capacity, false_positive_probability);
bloom_filter.insert(&"Hello!");
bloom_filter.insert(&34);
assert!(bloom_filter.contains(&"Hello!"));
assert!(bloom_filter.contains(&34));
assert_eq!(false, bloom_filter.contains(&"Not in filter"));
}
```
*/
pub type DefaultBloomFilter = ;
/// This trait defines the basic functionality supported by the bloom filters in this library.
///
/// Calculate the optimal bit count to satisfy the desired constraints.
/// Formula taken from Sagi Kedmi:
/// > S. Kedmi, ["Bloom Filters for the Perplexed"](https://sagi.io/bloom-filters-for-the-perplexed/), July 2017 [Accessed: 02.12.2020]
/// Calculate the optimal number of hashers to satisfy the desired constraints.
/// Formula taken from Sagi Kedmi:
/// > S. Kedmi, ["Bloom Filters for the Perplexed"](https://sagi.io/bloom-filters-for-the-perplexed/), July 2017 [Accessed: 02.12.2020]
/// Approximate number of elements stored.
/// Formula taken from Wikipedia:
/// > Wikipedia, ["Bloom filter"](https://en.wikipedia.org/wiki/Bloom_filter#Approximating_the_number_of_items_in_a_Bloom_filter) [Accessed: 02.12.2020]
/// Return the current approximate false positive probability which depends on the current
/// number of elements in the filter.
/// Formula taken from Sagi Kedmi:
/// > S. Kedmi, ["Bloom Filters for the Perplexed"](https://sagi.io/bloom-filters-for-the-perplexed/), July 2017 [Accessed: 02.12.2020]