composable_indexes/
lib.rs

1//! A library for in-memory collections with simple, performant and composable indexes.
2//!
3//! # Example
4//!
5//! ```rust
6//! use composable_indexes::{Collection, index, aggregation};
7//!
8//! struct Person { name: String, age: u32, occupation: String }
9//!
10//! // Define indexes on the Person struct
11//! #[derive(composable_indexes::Index)]
12//! #[index(Person)]
13//! struct PersonIndex {
14//!   by_name: index::Premap<Person, String, index::HashTable<String>>,
15//!   by_age: index::PremapOwned<Person, u32, index::BTree<u32>>,
16//!   by_occupation: index::Grouped<Person, String, aggregation::Count>,
17//! }
18//!
19//! // Create the collection.
20//! let mut collection = Collection::new(
21//!   PersonIndex {
22//!     by_name: index::Premap::new(|p: &Person| &p.name, index::HashTable::new()),
23//!     by_age: index::PremapOwned::new(|p: &Person| p.age, index::BTree::new()),
24//!     by_occupation: index::Grouped::new(|p: &Person| &p.occupation, || aggregation::Count::new()),
25//!   }
26//! );
27//!
28//! // insert items
29//! collection.insert(Person { name: "Alice".to_string(), age: 30, occupation: "Engineer".to_string() });
30//! collection.insert(Person { name: "Bob".to_string(), age: 25, occupation: "Designer".to_string() });
31//!
32//! // Query by name - returns keys, use get_one to get the first match
33//! let alice_key = collection.query(|ix| ix.by_name.get_one(&"Alice".to_string()));
34//!
35//! // Query oldest person
36//! let _oldest = collection.query(|ix| ix.by_age.max_one());
37//!
38//! // Query the count of designers
39//! let _designer_count = collection.query(|ix| ix.by_occupation.get(&"Designer".to_string()).count());
40//! ```
41//!
42//! # Motivation
43//!
44//! Rust standard library (and ecosystem) provides excellent in-memory data structures like `HashMap`,
45//! `BTreeMap`, `Vec`, and so on. However, if we need to index multiple fields of a data structure,
46//! we will need to build and maintain multiple data structures, and carefully write code to keep
47//! them in sync.
48//!
49//! `composable-indexes` aims to solve this problem by introducing a concept called [`Index`]es, which
50//! are composable data structures that are declared upfront and automatically kept in sync with the
51//! main collection. It's designed to be lightweight, easy to extend, and efficient.
52//!
53//! ## Collection
54//!
55//! At the core of `composable-indexes` is the [`Collection<T, Ix>`](Collection) type, which represents
56//! a collection of items of type `T`, indexed by a set of indexes `Ix`. A [`Collection`] owns the data,
57//! and the indexes hold pointers to the data in the collection. When we insert, update, or delete items
58//! in the collection, it automatically updates the indexes accordingly.
59//!
60//! When inserting an item into a [`Collection`](Collection), it returns a [`Key`](Key) which can be used to refer to
61//! the inserted item later (for updates or deletions) without resorting to a query.
62//!
63//! **Warning**: The collection can only guarantee the validity of the index as long as the items are not
64//! mutated in place. This should only be possible using interior mutability (e.g., `Cell`, `RefCell`,
65//! `Mutex`, atomics etc.). So - do not use interior mutability on the fields that are indexed. This can
66//! not only cause indexes to go out of sync, but more importantly might cause panics or undefined behaviour.
67//!
68//! But the more interesting part is querying. Methods like [Collection::query], [Collection::update] and
69//! [Collection::delete] methods accept a callback that receives a reference to the index structure, allowing us
70//! to perform efficient lookups using the indexes defined on the collection. There is also counterparts like
71//! [Collection::query_keys] that return the keys of the items matching the query, or [Collection::query_with_keys]
72//! that return both the keys and the items.
73//!
74//! Keep reading to learn more about indexes & querying!
75//!
76//! ## Indexes
77//!
78//! Each collection is attached to an an index. And index is a data structure that implements the [Index] trait -
79//! which the [Collection] uses to keep the index in sync with the data. The indexes can be accessed with the
80//! methods like [Collection::query] mentioned above, and expose methods to perform queries efficiently. The results
81//! of those queries are ordinary Rust data structures that implement [QueryResult] trait - which is then used by
82//! the collection to "translate" the result of the query to the actual data or perform updates/deletions.
83//!
84//! Most used indexes are [index::HashTable] and [index::BTree]:
85//!
86//! - [index::HashTable] wraps a [HashMap](hashbrown::HashMap) and provides efficient lookups by key. It's the most
87//!   commonly used index for equality lookups. Answer queries like "get all the items with field X equal to Y".
88//! - [index::BTree] wraps a [BTreeMap](std::collections::BTreeMap) and provides efficient range queries (on top of
89//!   equality lookups). Answer queries like "get all the items with field X is greater than A" or "get the item where the field
90//!   X is the biggest").
91//!
92//! This brings us to the `composable` part of `composable-indexes`. This library contains "higher-order indexes" that
93//! can wrap other indexes to:
94//!
95//! - Apply the index to a specific field of the data ([index::Premap])
96//! - Apply the index to a subset of the data ([index::Filtered])
97//! - Group the data by a key and apply an index/aggregation to each group ([index::Grouped])
98//!
99//! On top of those - tuples also implement the [Index] trait, alllowing combining multiple indexes in parallel.
100//! Alternatively - the [Index] derive macro can be used to define tuple-like indexes with named fields.
101//!
102//! # Performance
103//!
104//! `composable-indexes` is designed with performance in mind. The interfaces are designed
105//! to compile away, and only expose the underlying data structures. In other words, think
106//! of a [`Collection`] as a way to translate operations to the underlying index structures
107//! at compile time without adding (significant) runtime overhead.
108//!
109//! Data structures in `composable-indexes` hold the data entirely in memory. Usually, the
110//! data is owned by the [`Collection`] itself, and indexes hold pointers to the data in the
111//! collection (called a [`Key`]). This means that for most queries you can expect one
112//! lookup to the index structure to obtain the pointer, and then one lookup to the
113//! collection to obtain the actual data.
114//!
115//! ## Index Performance
116//!
117//! The common indexes ([`BTree`](index::BTree), [`HashTable`](index::HashTable)) are simply thin wrappers around
118//! `std::collections::BTreeMap` and `std::collections::HashMap`, so you can expect the
119//! same performance characteristics as those data structures. They are keyed by the input
120//! (usually a field of the stored type) and values are sets of pointers to the actual
121//! data stored in the collection.
122//!
123//! Higher order indexes like [index::Filtered], [index::Premap] are all zero-cost abstractions and have
124//! negligible overhead.
125//!
126//! **Important**: Because of not doing bookkeeping themselves, the functions passed to
127//! higher-order indexes should be fast to compute, as they will not be cached and are
128//! computed on-the-fly. Ideally, they should be things like field accesses rather than
129//! expensive computations.
130//!
131//! The most commonly used indexes are [`HashTable`](index::HashTable) for equality lookups and [`BTree`](index::BTree) for
132//! range queries. Between those two, hashtables are the fastest. They also come with
133//! immutable counterparts (with the `imbl` feature) which tend to be slower, but allow
134//! cheap cloning and multi-versioning of the database.
135//!
136//! | Index Type | Operations | Insert | Remove | Query | Memory |
137//! |------------|------------|--------|--------|-------|--------|
138//! | [`index::Keys`] | get all keys | O(1) | O(1) | O(n) | O(n) |
139//! | [`index::HashTable`] | contains, get, count distinct | O(1) | O(1) | O(1) | O(n) |
140//! | [`index::BTree`] | contains, get, count distinct, range, min, max | O(log n) | O(log n) | O(log n) | O(n) |
141//! | [`index::SuffixTree`] | string search | O(k * log n) † | O(k * log n) † | O(log n) | O(n) ‡ |
142//! | Aggregations | count, sum, mean, stddev | O(1) | O(1) | O(1) | O(1) |
143//!
144//! † k = length of the string
145//!
146//! ‡ Suffix trees have have a high memory footprint (even though linear), expect 5-10 times the length of the input strings.
147//!
148//! ## Aggregation Performance
149//!
150//! All built-in aggregations are calculated iteratively, without holding the data in
151//! memory. You can expect O(1) memory and time complexity regardless of the size of the
152//! collection.
153//!
154//! As an example, [`aggregations::Count`](aggregation::Count) simply increments and decrements a counter as
155//! items are inserted and removed, [`aggregations::Mean`](aggregation::Mean) only keeps track of the sum and
156//! count and so on.
157//!
158//! ## Indexing Overhead
159//!
160//! A [`Collection`] is simply a `HashMap`, and indexes are additional data structures.
161//! Hence, inserting an element into a [`Collection`] simply compiles down to inserting
162//! the element into the underlying `HashMap`, and then inserting pointers to the same
163//! element into each of the indexes. Hence, the overhead of adding indexes is linear
164//! in the number of indexes.
165//!
166//! As an example, in the benchmark below we compare inserting elements into a
167//! `std::collections::HashMap` versus inserting the same elements into a
168//! [Collection] with zero, one, two, three, and four indexes.
169//! You can see that without an index, the performance is exactly the same as a
170//! `HashMap`, and adding an index linearly increases the insertion time.
171//!
172#![cfg_attr(doc, doc=include_str!("../doc_assets/bench_indexing_overhead.svg"))]
173//!
174//! # Security
175//!
176//! As both [`Collection`] and [`index::HashTable`] index are backed by hash maps, the choice of the
177//! hash function can have a significant impact on performance. `composable-indexes`
178//! defaults to the default hasher of `hashbrown`, which is `foldhash` that is fast,
179//! but prone to HashDoS attacks. If you need a different
180//! hasher, they can be provided when constructing collection and the indexes. See `foldhash`'s
181//! README for more details: <https://github.com/orlp/foldhash>
182//!
183
184#![cfg_attr(all(not(feature = "std"), not(feature = "testutils")), no_std)]
185
186extern crate alloc;
187
188pub mod core;
189
190#[doc(inline)]
191pub use core::{Collection, Index, Key, QueryResult, QueryResultDistinct, ShallowClone};
192
193pub mod aggregation;
194pub mod index;
195
196#[cfg(feature = "testutils")]
197pub mod testutils;
198
199#[cfg(feature = "derive")]
200pub use composable_indexes_derive::{Index, ShallowClone};
201
202// Some tests for the Collection functionality are defined
203// here so we can utilize the testutils crate.
204#[cfg(test)]
205mod test {
206    use crate::core::*;
207    use crate::testutils::test_index;
208
209    macro_rules! op_insert {
210        ($key:expr, $new:expr) => {
211            $crate::testutils::Op::Insert($crate::testutils::Insert_ {
212                key: Key::unsafe_from_u64($key),
213                new: $new,
214            })
215        };
216    }
217
218    macro_rules! op_update {
219        ($key:expr, $existing:expr, $new:expr) => {
220            $crate::testutils::Op::Update($crate::testutils::Update_ {
221                key: Key::unsafe_from_u64($key),
222                new: $new,
223                existing: $existing,
224            })
225        };
226    }
227
228    macro_rules! op_remove {
229        ($key:expr, $existing:expr) => {
230            $crate::testutils::Op::Remove($crate::testutils::Remove_ {
231                key: Key::unsafe_from_u64($key),
232                existing: $existing,
233            })
234        };
235    }
236
237    #[test]
238    fn simple() {
239        let mut db = Collection::<u32, _>::new(test_index());
240
241        let one = db.insert(1);
242        let two = db.insert(2);
243        let three = db.insert(3);
244        db.update_by_key(two, |_| 10);
245        let four = db.insert(4);
246        db.delete_by_key(three);
247
248        assert_eq!(db.get_by_key(one), Some(&1));
249        assert_eq!(db.get_by_key(two), Some(&10));
250        assert_eq!(db.get_by_key(three), None);
251        assert_eq!(db.get_by_key(four), Some(&4));
252        assert_eq!(db.len(), 3);
253
254        // Access test index operations directly
255        let ops = db.query(|ix| Plain(ix.ops.clone()));
256        assert_eq!(
257            ops,
258            vec![
259                op_insert!(0, 1),
260                op_insert!(1, 2),
261                op_insert!(2, 3),
262                op_update!(1, 2, 10),
263                op_insert!(3, 4),
264                op_remove!(2, 3),
265            ]
266        );
267    }
268
269    #[test]
270    fn update_mut_updates() {
271        let mut db = Collection::<u32, _>::new(test_index());
272
273        let one = db.insert(1);
274        db.update_by_key_mut(one, |v| {
275            if let Some(v) = v {
276                *v += 1;
277            }
278        });
279
280        assert_eq!(db.get_by_key(one), Some(&2));
281        assert_eq!(db.len(), 1);
282        let ops = db.query(|ix| Plain(ix.ops.clone()));
283        assert_eq!(
284            ops,
285            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
286        );
287    }
288
289    #[test]
290    fn update_mut_inserts() {
291        let mut db = Collection::<u32, _>::new(test_index());
292
293        let one = db.insert(1);
294        db.delete_by_key(one);
295        db.update_by_key_mut(one, |v| {
296            assert!(v.is_none());
297            *v = Some(2);
298        });
299
300        assert_eq!(db.get_by_key(one), Some(&2));
301        assert_eq!(db.len(), 1);
302        let ops = db.query(|ix| Plain(ix.ops.clone()));
303        assert_eq!(
304            ops,
305            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
306        );
307    }
308
309    #[test]
310    fn update_mut_removes() {
311        let mut db = Collection::<u32, _>::new(test_index());
312
313        let one = db.insert(1);
314        db.update_by_key_mut(one, |v| {
315            assert!(v.is_some());
316            *v = None;
317        });
318
319        assert_eq!(db.get_by_key(one), None);
320        assert_eq!(db.len(), 0);
321        let ops = db.query(|ix| Plain(ix.ops.clone()));
322        assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
323    }
324
325    #[test]
326    fn update_updates() {
327        let mut db = Collection::<u32, _>::new(test_index());
328
329        let one = db.insert(1);
330        db.update_by_key(one, |_| 2);
331
332        assert_eq!(db.get_by_key(one), Some(&2));
333        assert_eq!(db.len(), 1);
334        let ops = db.query(|ix| Plain(ix.ops.clone()));
335        assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
336    }
337
338    #[test]
339    fn update_inserts() {
340        let mut db = Collection::<u32, _>::new(test_index());
341
342        let one = db.insert(1);
343        db.delete_by_key(one);
344
345        db.update_by_key(one, |x| {
346            assert_eq!(x, None);
347            2
348        });
349
350        assert_eq!(db.get_by_key(one), Some(&2));
351        assert_eq!(db.len(), 1);
352        let ops = db.query(|ix| Plain(ix.ops.clone()));
353        assert_eq!(
354            ops,
355            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
356        );
357    }
358
359    #[test]
360    fn adjust_mut_updates() {
361        let mut db = Collection::<u32, _>::new(test_index());
362
363        let one = db.insert(1);
364        db.adjust_by_key_mut(one, |v| {
365            *v = 2;
366        });
367
368        assert_eq!(db.get_by_key(one), Some(&2));
369        assert_eq!(db.len(), 1);
370        let ops = db.query(|ix| Plain(ix.ops.clone()));
371        assert_eq!(
372            ops,
373            vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
374        );
375    }
376
377    #[test]
378    fn adjust_mut_ignores_non_existent() {
379        let mut db = Collection::<u32, _>::new(test_index());
380
381        let one = db.insert(1);
382        db.delete_by_key(one);
383
384        db.adjust_by_key_mut(one, |_| {
385            panic!("Should not be called");
386        });
387
388        assert_eq!(db.get_by_key(one), None);
389        assert_eq!(db.len(), 0);
390        let ops = db.query(|ix| Plain(ix.ops.clone()));
391        assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
392    }
393
394    #[test]
395    fn adjust_updates() {
396        let mut db = Collection::<u32, _>::new(test_index());
397
398        let one = db.insert(1);
399        db.adjust_by_key(one, |_| 2);
400
401        assert_eq!(db.get_by_key(one), Some(&2));
402        assert_eq!(db.len(), 1);
403        let ops = db.query(|ix| Plain(ix.ops.clone()));
404        assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
405    }
406
407    #[test]
408    fn adjust_ignores_non_existent() {
409        let mut db = Collection::<u32, _>::new(test_index());
410
411        let one = db.insert(1);
412        db.delete_by_key(one);
413
414        db.adjust_by_key(one, |_| 2);
415
416        assert_eq!(db.get_by_key(one), None);
417        assert_eq!(db.len(), 0);
418        let ops = db.query(|ix| Plain(ix.ops.clone()));
419        assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
420    }
421}