composable_indexes/lib.rs
1//! A library for in-memory collections with simple, performant and composable indexes.
2//!
3//! # Example
4//!
5//! ```rust
6//! use composable_indexes::{Collection, index, aggregation};
7//!
8//! struct Person { name: String, age: u32, occupation: String }
9//!
10//! // Define indexes on the Person struct
11//! #[derive(composable_indexes::Index)]
12//! #[index(Person)]
13//! struct PersonIndex {
14//! by_name: index::Premap<Person, String, index::HashTable<String>>,
15//! by_age: index::PremapOwned<Person, u32, index::BTree<u32>>,
16//! by_occupation: index::Grouped<Person, String, aggregation::Count>,
17//! }
18//!
19//! // Create the collection.
20//! let mut collection = Collection::new(
21//! PersonIndex {
22//! by_name: index::Premap::new(|p: &Person| &p.name, index::HashTable::new()),
23//! by_age: index::PremapOwned::new(|p: &Person| p.age, index::BTree::new()),
24//! by_occupation: index::Grouped::new(|p: &Person| &p.occupation, || aggregation::Count::new()),
25//! }
26//! );
27//!
28//! // insert items
29//! collection.insert(Person { name: "Alice".to_string(), age: 30, occupation: "Engineer".to_string() });
30//! collection.insert(Person { name: "Bob".to_string(), age: 25, occupation: "Designer".to_string() });
31//!
32//! // Query by name - returns keys, use get_one to get the first match
33//! let alice_key = collection.query(|ix| ix.by_name.get_one(&"Alice".to_string()));
34//!
35//! // Query oldest person
36//! let _oldest = collection.query(|ix| ix.by_age.max_one());
37//!
38//! // Query the count of designers
39//! let _designer_count = collection.query(|ix| ix.by_occupation.get(&"Designer".to_string()).count());
40//! ```
41//!
42//! # Motivation
43//!
44//! Rust standard library (and ecosystem) provides excellent in-memory data structures like `HashMap`,
45//! `BTreeMap`, `Vec`, and so on. However, if we need to index multiple fields of a data structure,
46//! we will need to build and maintain multiple data structures, and carefully write code to keep
47//! them in sync.
48//!
49//! `composable-indexes` aims to solve this problem by introducing a concept called [`Index`]es, which
50//! are composable data structures that are declared upfront and automatically kept in sync with the
51//! main collection. It's designed to be lightweight, easy to extend, and efficient.
52//!
53//! ## Collection
54//!
55//! At the core of `composable-indexes` is the [`Collection<T, Ix>`](Collection) type, which represents
56//! a collection of items of type `T`, indexed by a set of indexes `Ix`. A [`Collection`] owns the data,
57//! and the indexes hold pointers to the data in the collection. When we insert, update, or delete items
58//! in the collection, it automatically updates the indexes accordingly.
59//!
60//! When inserting an item into a [`Collection`](Collection), it returns a [`Key`](Key) which can be used to refer to
61//! the inserted item later (for updates or deletions) without resorting to a query.
62//!
63//! **Warning**: The collection can only guarantee the validity of the index as long as the items are not
64//! mutated in place. This should only be possible using interior mutability (e.g., `Cell`, `RefCell`,
65//! `Mutex`, atomics etc.). So - do not use interior mutability on the fields that are indexed. This can
66//! not only cause indexes to go out of sync, but more importantly might cause panics or undefined behaviour.
67//!
68//! But the more interesting part is querying. Methods like [Collection::query], [Collection::update] and
69//! [Collection::delete] methods accept a callback that receives a reference to the index structure, allowing us
70//! to perform efficient lookups using the indexes defined on the collection. There is also counterparts like
71//! [Collection::query_keys] that return the keys of the items matching the query, or [Collection::query_with_keys]
72//! that return both the keys and the items.
73//!
74//! Keep reading to learn more about indexes & querying!
75//!
76//! ## Indexes
77//!
78//! Each collection is attached to an an index. And index is a data structure that implements the [Index] trait -
79//! which the [Collection] uses to keep the index in sync with the data. The indexes can be accessed with the
80//! methods like [Collection::query] mentioned above, and expose methods to perform queries efficiently. The results
81//! of those queries are ordinary Rust data structures that implement [QueryResult] trait - which is then used by
82//! the collection to "translate" the result of the query to the actual data or perform updates/deletions.
83//!
84//! Most used indexes are [index::HashTable] and [index::BTree]:
85//!
86//! - [index::HashTable] wraps a [HashMap](hashbrown::HashMap) and provides efficient lookups by key. It's the most
87//! commonly used index for equality lookups. Answer queries like "get all the items with field X equal to Y".
88//! - [index::BTree] wraps a [BTreeMap](std::collections::BTreeMap) and provides efficient range queries (on top of
89//! equality lookups). Answer queries like "get all the items with field X is greater than A" or "get the item where the field
90//! X is the biggest").
91//!
92//! This brings us to the `composable` part of `composable-indexes`. This library contains "higher-order indexes" that
93//! can wrap other indexes to:
94//!
95//! - Apply the index to a specific field of the data ([index::Premap])
96//! - Apply the index to a subset of the data ([index::Filtered])
97//! - Group the data by a key and apply an index/aggregation to each group ([index::Grouped])
98//!
99//! On top of those - tuples also implement the [Index] trait, alllowing combining multiple indexes in parallel.
100//! Alternatively - the [Index] derive macro can be used to define tuple-like indexes with named fields.
101//!
102//! # Performance
103//!
104//! `composable-indexes` is designed with performance in mind. The interfaces are designed
105//! to compile away, and only expose the underlying data structures. In other words, think
106//! of a [`Collection`] as a way to translate operations to the underlying index structures
107//! at compile time without adding (significant) runtime overhead.
108//!
109//! Data structures in `composable-indexes` hold the data entirely in memory. Usually, the
110//! data is owned by the [`Collection`] itself, and indexes hold pointers to the data in the
111//! collection (called a [`Key`]). This means that for most queries you can expect one
112//! lookup to the index structure to obtain the pointer, and then one lookup to the
113//! collection to obtain the actual data.
114//!
115//! ## Index Performance
116//!
117//! The common indexes ([`BTree`](index::BTree), [`HashTable`](index::HashTable)) are simply thin wrappers around
118//! `std::collections::BTreeMap` and `std::collections::HashMap`, so you can expect the
119//! same performance characteristics as those data structures. They are keyed by the input
120//! (usually a field of the stored type) and values are sets of pointers to the actual
121//! data stored in the collection.
122//!
123//! Higher order indexes like [index::Filtered], [index::Premap] are all zero-cost abstractions and have
124//! negligible overhead.
125//!
126//! **Important**: Because of not doing bookkeeping themselves, the functions passed to
127//! higher-order indexes should be fast to compute, as they will not be cached and are
128//! computed on-the-fly. Ideally, they should be things like field accesses rather than
129//! expensive computations.
130//!
131//! The most commonly used indexes are [`HashTable`](index::HashTable) for equality lookups and [`BTree`](index::BTree) for
132//! range queries. Between those two, hashtables are the fastest. They also come with
133//! immutable counterparts (with the `imbl` feature) which tend to be slower, but allow
134//! cheap cloning and multi-versioning of the database.
135//!
136//! | Index Type | Operations | Insert | Remove | Query | Memory |
137//! |------------|------------|--------|--------|-------|--------|
138//! | [`index::Keys`] | get all keys | O(1) | O(1) | O(n) | O(n) |
139//! | [`index::HashTable`] | contains, get, count distinct | O(1) | O(1) | O(1) | O(n) |
140//! | [`index::BTree`] | contains, get, count distinct, range, min, max | O(log n) | O(log n) | O(log n) | O(n) |
141//! | [`index::SuffixTree`] | string search | O(k * log n) † | O(k * log n) † | O(log n) | O(n) ‡ |
142//! | Aggregations | count, sum, mean, stddev | O(1) | O(1) | O(1) | O(1) |
143//!
144//! † k = length of the string
145//!
146//! ‡ Suffix trees have have a high memory footprint (even though linear), expect 5-10 times the length of the input strings.
147//!
148//! ## Aggregation Performance
149//!
150//! All built-in aggregations are calculated iteratively, without holding the data in
151//! memory. You can expect O(1) memory and time complexity regardless of the size of the
152//! collection.
153//!
154//! As an example, [`aggregations::Count`](aggregation::Count) simply increments and decrements a counter as
155//! items are inserted and removed, [`aggregations::Mean`](aggregation::Mean) only keeps track of the sum and
156//! count and so on.
157//!
158//! ## Indexing Overhead
159//!
160//! A [`Collection`] is simply a `HashMap`, and indexes are additional data structures.
161//! Hence, inserting an element into a [`Collection`] simply compiles down to inserting
162//! the element into the underlying `HashMap`, and then inserting pointers to the same
163//! element into each of the indexes. Hence, the overhead of adding indexes is linear
164//! in the number of indexes.
165//!
166//! As an example, in the benchmark below we compare inserting elements into a
167//! `std::collections::HashMap` versus inserting the same elements into a
168//! [Collection] with zero, one, two, three, and four indexes.
169//! You can see that without an index, the performance is exactly the same as a
170//! `HashMap`, and adding an index linearly increases the insertion time.
171//!
172#![cfg_attr(doc, doc=include_str!("../doc_assets/bench_indexing_overhead.svg"))]
173//!
174//! # Security
175//!
176//! As both [`Collection`] and [`index::HashTable`] index are backed by hash maps, the choice of the
177//! hash function can have a significant impact on performance. `composable-indexes`
178//! defaults to the default hasher of `hashbrown`, which is `foldhash` that is fast,
179//! but prone to HashDoS attacks. If you need a different
180//! hasher, they can be provided when constructing collection and the indexes. See `foldhash`'s
181//! README for more details: <https://github.com/orlp/foldhash>
182//!
183
184#![cfg_attr(all(not(feature = "std"), not(feature = "testutils")), no_std)]
185
186extern crate alloc;
187
188pub mod core;
189
190#[doc(inline)]
191pub use core::{Collection, Index, Key, QueryResult, QueryResultDistinct, ShallowClone};
192
193pub mod aggregation;
194pub mod index;
195
196#[cfg(feature = "testutils")]
197pub mod testutils;
198
199#[cfg(feature = "derive")]
200pub use composable_indexes_derive::{Index, ShallowClone};
201
202// Some tests for the Collection functionality are defined
203// here so we can utilize the testutils crate.
204#[cfg(test)]
205mod test {
206 use crate::core::*;
207 use crate::testutils::test_index;
208
209 macro_rules! op_insert {
210 ($key:expr, $new:expr) => {
211 $crate::testutils::Op::Insert($crate::testutils::Insert_ {
212 key: Key::unsafe_from_u64($key),
213 new: $new,
214 })
215 };
216 }
217
218 macro_rules! op_update {
219 ($key:expr, $existing:expr, $new:expr) => {
220 $crate::testutils::Op::Update($crate::testutils::Update_ {
221 key: Key::unsafe_from_u64($key),
222 new: $new,
223 existing: $existing,
224 })
225 };
226 }
227
228 macro_rules! op_remove {
229 ($key:expr, $existing:expr) => {
230 $crate::testutils::Op::Remove($crate::testutils::Remove_ {
231 key: Key::unsafe_from_u64($key),
232 existing: $existing,
233 })
234 };
235 }
236
237 #[test]
238 fn simple() {
239 let mut db = Collection::<u32, _>::new(test_index());
240
241 let one = db.insert(1);
242 let two = db.insert(2);
243 let three = db.insert(3);
244 db.update_by_key(two, |_| 10);
245 let four = db.insert(4);
246 db.delete_by_key(three);
247
248 assert_eq!(db.get_by_key(one), Some(&1));
249 assert_eq!(db.get_by_key(two), Some(&10));
250 assert_eq!(db.get_by_key(three), None);
251 assert_eq!(db.get_by_key(four), Some(&4));
252 assert_eq!(db.len(), 3);
253
254 // Access test index operations directly
255 let ops = db.query(|ix| Plain(ix.ops.clone()));
256 assert_eq!(
257 ops,
258 vec![
259 op_insert!(0, 1),
260 op_insert!(1, 2),
261 op_insert!(2, 3),
262 op_update!(1, 2, 10),
263 op_insert!(3, 4),
264 op_remove!(2, 3),
265 ]
266 );
267 }
268
269 #[test]
270 fn update_mut_updates() {
271 let mut db = Collection::<u32, _>::new(test_index());
272
273 let one = db.insert(1);
274 db.update_by_key_mut(one, |v| {
275 if let Some(v) = v {
276 *v += 1;
277 }
278 });
279
280 assert_eq!(db.get_by_key(one), Some(&2));
281 assert_eq!(db.len(), 1);
282 let ops = db.query(|ix| Plain(ix.ops.clone()));
283 assert_eq!(
284 ops,
285 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
286 );
287 }
288
289 #[test]
290 fn update_mut_inserts() {
291 let mut db = Collection::<u32, _>::new(test_index());
292
293 let one = db.insert(1);
294 db.delete_by_key(one);
295 db.update_by_key_mut(one, |v| {
296 assert!(v.is_none());
297 *v = Some(2);
298 });
299
300 assert_eq!(db.get_by_key(one), Some(&2));
301 assert_eq!(db.len(), 1);
302 let ops = db.query(|ix| Plain(ix.ops.clone()));
303 assert_eq!(
304 ops,
305 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
306 );
307 }
308
309 #[test]
310 fn update_mut_removes() {
311 let mut db = Collection::<u32, _>::new(test_index());
312
313 let one = db.insert(1);
314 db.update_by_key_mut(one, |v| {
315 assert!(v.is_some());
316 *v = None;
317 });
318
319 assert_eq!(db.get_by_key(one), None);
320 assert_eq!(db.len(), 0);
321 let ops = db.query(|ix| Plain(ix.ops.clone()));
322 assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
323 }
324
325 #[test]
326 fn update_updates() {
327 let mut db = Collection::<u32, _>::new(test_index());
328
329 let one = db.insert(1);
330 db.update_by_key(one, |_| 2);
331
332 assert_eq!(db.get_by_key(one), Some(&2));
333 assert_eq!(db.len(), 1);
334 let ops = db.query(|ix| Plain(ix.ops.clone()));
335 assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
336 }
337
338 #[test]
339 fn update_inserts() {
340 let mut db = Collection::<u32, _>::new(test_index());
341
342 let one = db.insert(1);
343 db.delete_by_key(one);
344
345 db.update_by_key(one, |x| {
346 assert_eq!(x, None);
347 2
348 });
349
350 assert_eq!(db.get_by_key(one), Some(&2));
351 assert_eq!(db.len(), 1);
352 let ops = db.query(|ix| Plain(ix.ops.clone()));
353 assert_eq!(
354 ops,
355 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
356 );
357 }
358
359 #[test]
360 fn adjust_mut_updates() {
361 let mut db = Collection::<u32, _>::new(test_index());
362
363 let one = db.insert(1);
364 db.adjust_by_key_mut(one, |v| {
365 *v = 2;
366 });
367
368 assert_eq!(db.get_by_key(one), Some(&2));
369 assert_eq!(db.len(), 1);
370 let ops = db.query(|ix| Plain(ix.ops.clone()));
371 assert_eq!(
372 ops,
373 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
374 );
375 }
376
377 #[test]
378 fn adjust_mut_ignores_non_existent() {
379 let mut db = Collection::<u32, _>::new(test_index());
380
381 let one = db.insert(1);
382 db.delete_by_key(one);
383
384 db.adjust_by_key_mut(one, |_| {
385 panic!("Should not be called");
386 });
387
388 assert_eq!(db.get_by_key(one), None);
389 assert_eq!(db.len(), 0);
390 let ops = db.query(|ix| Plain(ix.ops.clone()));
391 assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
392 }
393
394 #[test]
395 fn adjust_updates() {
396 let mut db = Collection::<u32, _>::new(test_index());
397
398 let one = db.insert(1);
399 db.adjust_by_key(one, |_| 2);
400
401 assert_eq!(db.get_by_key(one), Some(&2));
402 assert_eq!(db.len(), 1);
403 let ops = db.query(|ix| Plain(ix.ops.clone()));
404 assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
405 }
406
407 #[test]
408 fn adjust_ignores_non_existent() {
409 let mut db = Collection::<u32, _>::new(test_index());
410
411 let one = db.insert(1);
412 db.delete_by_key(one);
413
414 db.adjust_by_key(one, |_| 2);
415
416 assert_eq!(db.get_by_key(one), None);
417 assert_eq!(db.len(), 0);
418 let ops = db.query(|ix| Plain(ix.ops.clone()));
419 assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
420 }
421}