composable_indexes/
lib.rs

1//! A library for in-memory collections with simple, performant and composable indexes.
2//!
3//! # Example
4//!
5//! ```rust
6//! use composable_indexes::{Collection, index, aggregation};
7//!
8//! struct Person { name: String, age: u32, occupation: String }
9//!
10//! // Define indexes on the Person struct
11//! #[derive(composable_indexes::Index)]
12//! #[index(Person)]
13//! struct PersonIndex {
14//!   by_name: index::Premap<Person, String, index::HashTable<String>>,
15//!   by_age: index::PremapOwned<Person, u32, index::BTree<u32>>,
16//!   by_occupation: index::Grouped<Person, String, aggregation::Count>,
17//! }
18//!
19//! // Create the collection.
20//! let mut collection = Collection::new(
21//!   PersonIndex {
22//!     by_name: index::Premap::new(|p: &Person| &p.name, index::HashTable::new()),
23//!     by_age: index::PremapOwned::new(|p: &Person| p.age, index::BTree::new()),
24//!     by_occupation: index::Grouped::new(|p: &Person| &p.occupation, || aggregation::Count::new()),
25//!   }
26//! );
27//!
28//! // insert items
29//! collection.insert(Person { name: "Alice".to_string(), age: 30, occupation: "Engineer".to_string() });
30//! collection.insert(Person { name: "Bob".to_string(), age: 25, occupation: "Designer".to_string() });
31//!
32//! // Query by name - returns keys, use get_one to get the first match
33//! let alice_key = collection.query(|ix| ix.by_name.get_one(&"Alice".to_string()));
34//!
35//! // Query oldest person
36//! let _oldest = collection.query(|ix| ix.by_age.max_one());
37//!
38//! // Query the count of designers
39//! let _designer_count = collection.query(|ix| ix.by_occupation.get(&"Designer".to_string()).count());
40//! ```
41//!
42//! # Motivation
43//!
44//! Rust standard library (and ecosystem) provides excellent in-memory data structures like `HashMap`,
45//! `BTreeMap`, `Vec`, and so on. However, if we need to index multiple fields of a data structure,
46//! we will need to build and maintain multiple data structures, and carefully write code to keep
47//! them in sync.
48//!
49//! `composable-indexes` aims to solve this problem by introducing a concept called [`Index`]es, which
50//! are composable data structures that are declared upfront and automatically kept in sync with the
51//! main collection. It's designed to be lightweight, easy to extend, and efficient.
52//!
53//! ## Collection
54//!
55//! At the core of `composable-indexes` is the [`Collection<T, Ix>`](Collection) type, which represents
56//! a collection of items of type `T`, indexed by a set of indexes `Ix`. A [`Collection`] owns the data,
57//! and the indexes hold pointers to the data in the collection. When we insert, update, or delete items
58//! in the collection, it automatically updates the indexes accordingly.
59//!
60//! When inserting an item into a [`Collection`](Collection), it returns a [`Key`](Key) which can be used to refer to
61//! the inserted item later (for updates or deletions) without resorting to a query.
62//!
63//! **Warning**: The collection can only guarantee the validity of the index as long as the items are not
64//! mutated in place. This should only be possible using interior mutability (e.g., `Cell`, `RefCell`,
65//! `Mutex`, atomics etc.). So - do not use interior mutability on the fields that are indexed. This can
66//! not only cause indexes to go out of sync, but more importantly might cause panics or undefined behaviour.
67//!
68//! But the more interesting part is querying. Methods like [Collection::query], [Collection::update] and
69//! [Collection::delete] methods accept a callback that receives a reference to the index structure, allowing us
70//! to perform efficient lookups using the indexes defined on the collection. There is also counterparts like
71//! [Collection::query_keys] that return the keys of the items matching the query, or [Collection::query_with_keys]
72//! that return both the keys and the items.
73//!
74//! Keep reading to learn more about indexes & querying!
75//!
76//! ## Indexes
77//!
78//! Each collection is attached to an an index. And index is a data structure that implements the [Index] trait -
79//! which the [Collection] uses to keep the index in sync with the data. The indexes can be accessed with the
80//! methods like [Collection::query] mentioned above, and expose methods to perform queries efficiently. The results
81//! of those queries are ordinary Rust data structures that implement [QueryResult] trait - which is then used by
82//! the collection to "translate" the result of the query to the actual data or perform updates/deletions.
83//!
84//! Most used indexes are [index::HashTable] and [index::BTree]:
85//!
86//! - [index::HashTable] wraps a [HashMap](hashbrown::HashMap) and provides efficient lookups by key. It's the most
87//!   commonly used index for equality lookups. Answer queries like "get all the items with field X equal to Y".
88//! - [index::BTree] wraps a [BTreeMap](std::collections::BTreeMap) and provides efficient range queries (on top of
89//!   equality lookups). Answer queries like "get all the items with field X is greater than A" or "get the item where the field
90//!   X is the biggest").
91//!
92//! This brings us to the `composable` part of `composable-indexes`. This library contains "higher-order indexes" that
93//! can wrap other indexes to:
94//!
95//! - Apply the index to a specific field of the data ([index::Premap])
96//! - Apply the index to a subset of the data ([index::Filtered])
97//! - Group the data by a key and apply an index/aggregation to each group ([index::Grouped])
98//!
99//! On top of those - tuples also implement the [Index] trait, alllowing combining multiple indexes in parallel.
100//! Alternatively - the [Index] derive macro can be used to define tuple-like indexes with named fields.
101//!
102//! # Performance
103//!
104//! `composable-indexes` is designed with performance in mind. The interfaces are designed
105//! to compile away, and only expose the underlying data structures. In other words, think
106//! of a [`Collection`] as a way to translate operations to the underlying index structures
107//! at compile time without adding (significant) runtime overhead.
108//!
109//! Data structures in `composable-indexes` hold the data entirely in memory. Usually, the
110//! data is owned by the [`Collection`] itself, and indexes hold pointers to the data in the
111//! collection (called a [`Key`]). This means that for most queries you can expect one
112//! lookup to the index structure to obtain the pointer, and then one lookup to the
113//! collection to obtain the actual data.
114//!
115//! ## Index Performance
116//!
117//! The common indexes ([`BTree`](index::BTree), [`HashTable`](index::HashTable)) are simply thin wrappers around
118//! `std::collections::BTreeMap` and `std::collections::HashMap`, so you can expect the
119//! same performance characteristics as those data structures. They are keyed by the input
120//! (usually a field of the stored type) and values are sets of pointers to the actual
121//! data stored in the collection.
122//!
123//! Higher order indexes like [index::Filtered], [index::Premap] are all zero-cost abstractions and have
124//! negligible overhead.
125//!
126//! **Important**: Because of not doing bookkeeping themselves, the functions passed to
127//! higher-order indexes should be fast to compute, as they will not be cached and are
128//! computed on-the-fly. Ideally, they should be things like field accesses rather than
129//! expensive computations.
130//!
131//! The most commonly used indexes are [`HashTable`](index::HashTable) for equality lookups and [`BTree`](index::BTree) for
132//! range queries. Between those two, hashtables are the fastest. They also come with
133//! immutable counterparts (with the `imbl` feature) which tend to be slower, but allow
134//! cheap cloning and multi-versioning of the database.
135//!
136//! | Index Type | Operations | Insert | Remove | Query | Memory |
137//! |------------|------------|--------|--------|-------|--------|
138//! | [`index::Keys`] | get all keys | O(1) | O(1) | O(n) | O(n) |
139//! | [`index::HashTable`] | contains, get, count distinct | O(1) | O(1) | O(1) | O(n) |
140//! | [`index::BTree`] | contains, get, count distinct, range, min, max | O(log n) | O(log n) | O(log n) | O(n) |
141//! | [`index::SuffixTree`] | string search | O(k * log n) † | O(k * log n) † | O(log n) | O(n) ‡ |
142//! | Aggregations | count, sum, mean, stddev | O(1) | O(1) | O(1) | O(1) |
143//!
144//! † k = length of the string
145//!
146//! ‡ Suffix trees have have a high memory footprint (even though linear), expect 5-10 times the length of the input strings.
147//!
148//!
149//! ## Aggregation Performance
150//!
151//! All built-in aggregations are calculated iteratively, without holding the data in
152//! memory. You can expect O(1) memory and time complexity regardless of the size of the
153//! collection.
154//!
155//! As an example, [`aggregations::Count`](aggregation::Count) simply increments and decrements a counter as
156//! items are inserted and removed, [`aggregations::Mean`](aggregation::Mean) only keeps track of the sum and
157//! count and so on.
158//!
159//! ## Indexing Overhead
160//!
161//! A [`Collection`] is simply a `HashMap`, and indexes are additional data structures.
162//! Hence, inserting an element into a [`Collection`] simply compiles down to inserting
163//! the element into the underlying `HashMap`, and then inserting pointers to the same
164//! element into each of the indexes. Hence, the overhead of adding indexes is linear
165//! in the number of indexes.
166//!
167//! As an example, in the benchmark below we compare inserting elements into a
168//! `std::collections::HashMap` versus inserting the same elements into a
169//! [Collection] with zero, one, two, three, and four indexes.
170//! You can see that without an index, the performance is exactly the same as a
171//! `HashMap`, and adding an index linearly increases the insertion time.
172//!
173#![cfg_attr(doc, doc=include_str!("../doc_assets/bench_indexing_overhead.svg"))]
174//!
175//! # Security
176//!
177//! As both [`Collection`] and [`index::HashTable`] index are backed by hash maps, the choice of the
178//! hash function can have a significant impact on performance. `composable-indexes`
179//! defaults to the default hasher of `hashbrown`, which is `foldhash` that is fast,
180//! but prone to HashDoS attacks. If you need a different
181//! hasher, they can be provided when constructing collection and the indexes. See `foldhash`'s
182//! README for more details: <https://github.com/orlp/foldhash>
183//!
184
185#![cfg_attr(all(not(feature = "std"), not(feature = "testutils")), no_std)]
186
187extern crate alloc;
188
189pub mod core;
190
191#[doc(inline)]
192pub use core::{Collection, Index, Key, QueryResult, QueryResultDistinct, ShallowClone};
193
194pub mod aggregation;
195pub mod index;
196
197#[cfg(feature = "testutils")]
198pub mod testutils;
199
200#[cfg(feature = "derive")]
201pub use composable_indexes_derive::{Index, ShallowClone};
202
203// Some tests for the Collection functionality are defined
204// here so we can utilize the testutils crate.
205#[cfg(test)]
206mod test {
207    use crate::core::*;
208    use crate::testutils::test_index;
209
210    macro_rules! op_insert {
211        ($key:expr, $new:expr) => {
212            $crate::testutils::Op::Insert($crate::testutils::Insert_ {
213                key: Key::unsafe_from_u64($key),
214                new: $new,
215            })
216        };
217    }
218
219    macro_rules! op_update {
220        ($key:expr, $existing:expr, $new:expr) => {
221            $crate::testutils::Op::Update($crate::testutils::Update_ {
222                key: Key::unsafe_from_u64($key),
223                new: $new,
224                existing: $existing,
225            })
226        };
227    }
228
229    macro_rules! op_remove {
230        ($key:expr, $existing:expr) => {
231            $crate::testutils::Op::Remove($crate::testutils::Remove_ {
232                key: Key::unsafe_from_u64($key),
233                existing: $existing,
234            })
235        };
236    }
237
238    #[test]
239    fn simple() {
240        let mut db = Collection::<u32, _>::new(test_index());
241
242        let one = db.insert(1);
243        let two = db.insert(2);
244        let three = db.insert(3);
245        db.update_by_key(two, |_| 10);
246        let four = db.insert(4);
247        db.delete_by_key(three);
248
249        assert_eq!(db.get_by_key(one), Some(&1));
250        assert_eq!(db.get_by_key(two), Some(&10));
251        assert_eq!(db.get_by_key(three), None);
252        assert_eq!(db.get_by_key(four), Some(&4));
253        assert_eq!(db.len(), 3);
254
255        // Access test index operations directly
256        let ops = db.query(|ix| Plain(ix.ops.clone()));
257        assert_eq!(
258            ops,
259            vec![
260                op_insert!(0, 1),
261                op_insert!(1, 2),
262                op_insert!(2, 3),
263                op_update!(1, 2, 10),
264                op_insert!(3, 4),
265                op_remove!(2, 3),
266            ]
267        );
268    }
269
270    #[test]
271    fn update_mut_updates() {
272        let mut db = Collection::<u32, _>::new(test_index());
273
274        let one = db.insert(1);
275        db.update_by_key_mut(one, |v| {
276            if let Some(v) = v {
277                *v += 1;
278            }
279        });
280
281        assert_eq!(db.get_by_key(one), Some(&2));
282        assert_eq!(db.len(), 1);
283        let ops = db.query(|ix| Plain(ix.ops.clone()));
284        assert_eq!(
285            ops,
286            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
287        );
288    }
289
290    #[test]
291    fn update_mut_inserts() {
292        let mut db = Collection::<u32, _>::new(test_index());
293
294        let one = db.insert(1);
295        db.delete_by_key(one);
296        db.update_by_key_mut(one, |v| {
297            assert!(v.is_none());
298            *v = Some(2);
299        });
300
301        assert_eq!(db.get_by_key(one), Some(&2));
302        assert_eq!(db.len(), 1);
303        let ops = db.query(|ix| Plain(ix.ops.clone()));
304        assert_eq!(
305            ops,
306            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
307        );
308    }
309
310    #[test]
311    fn update_mut_removes() {
312        let mut db = Collection::<u32, _>::new(test_index());
313
314        let one = db.insert(1);
315        db.update_by_key_mut(one, |v| {
316            assert!(v.is_some());
317            *v = None;
318        });
319
320        assert_eq!(db.get_by_key(one), None);
321        assert_eq!(db.len(), 0);
322        let ops = db.query(|ix| Plain(ix.ops.clone()));
323        assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
324    }
325
326    #[test]
327    fn update_updates() {
328        let mut db = Collection::<u32, _>::new(test_index());
329
330        let one = db.insert(1);
331        db.update_by_key(one, |_| 2);
332
333        assert_eq!(db.get_by_key(one), Some(&2));
334        assert_eq!(db.len(), 1);
335        let ops = db.query(|ix| Plain(ix.ops.clone()));
336        assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
337    }
338
339    #[test]
340    fn update_inserts() {
341        let mut db = Collection::<u32, _>::new(test_index());
342
343        let one = db.insert(1);
344        db.delete_by_key(one);
345
346        db.update_by_key(one, |x| {
347            assert_eq!(x, None);
348            2
349        });
350
351        assert_eq!(db.get_by_key(one), Some(&2));
352        assert_eq!(db.len(), 1);
353        let ops = db.query(|ix| Plain(ix.ops.clone()));
354        assert_eq!(
355            ops,
356            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
357        );
358    }
359
360    #[test]
361    fn adjust_mut_updates() {
362        let mut db = Collection::<u32, _>::new(test_index());
363
364        let one = db.insert(1);
365        db.adjust_by_key_mut(one, |v| {
366            *v = 2;
367        });
368
369        assert_eq!(db.get_by_key(one), Some(&2));
370        assert_eq!(db.len(), 1);
371        let ops = db.query(|ix| Plain(ix.ops.clone()));
372        assert_eq!(
373            ops,
374            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
375        );
376    }
377
378    #[test]
379    fn adjust_mut_ignores_non_existent() {
380        let mut db = Collection::<u32, _>::new(test_index());
381
382        let one = db.insert(1);
383        db.delete_by_key(one);
384
385        db.adjust_by_key_mut(one, |_| {
386            panic!("Should not be called");
387        });
388
389        assert_eq!(db.get_by_key(one), None);
390        assert_eq!(db.len(), 0);
391        let ops = db.query(|ix| Plain(ix.ops.clone()));
392        assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
393    }
394
395    #[test]
396    fn adjust_updates() {
397        let mut db = Collection::<u32, _>::new(test_index());
398
399        let one = db.insert(1);
400        db.adjust_by_key(one, |_| 2);
401
402        assert_eq!(db.get_by_key(one), Some(&2));
403        assert_eq!(db.len(), 1);
404        let ops = db.query(|ix| Plain(ix.ops.clone()));
405        assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
406    }
407
408    #[test]
409    fn adjust_ignores_non_existent() {
410        let mut db = Collection::<u32, _>::new(test_index());
411
412        let one = db.insert(1);
413        db.delete_by_key(one);
414
415        db.adjust_by_key(one, |_| 2);
416
417        assert_eq!(db.get_by_key(one), None);
418        assert_eq!(db.len(), 0);
419        let ops = db.query(|ix| Plain(ix.ops.clone()));
420        assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
421    }
422}