composable_indexes/lib.rs
1//! A library for in-memory collections with simple, performant and composable indexes.
2//!
3//! # Example
4//!
5//! ```rust
6//! use composable_indexes::{Collection, index, aggregation};
7//!
8//! struct Person { name: String, age: u32, occupation: String }
9//!
10//! // Define indexes on the Person struct
11//! #[derive(composable_indexes::Index)]
12//! #[index(Person)]
13//! struct PersonIndex {
14//! by_name: index::Premap<Person, String, index::HashTable<String>>,
15//! by_age: index::PremapOwned<Person, u32, index::BTree<u32>>,
16//! by_occupation: index::Grouped<Person, String, aggregation::Count>,
17//! }
18//!
19//! // Create the collection.
20//! let mut collection = Collection::new(
21//! PersonIndex {
22//! by_name: index::Premap::new(|p: &Person| &p.name, index::HashTable::new()),
23//! by_age: index::PremapOwned::new(|p: &Person| p.age, index::BTree::new()),
24//! by_occupation: index::Grouped::new(|p: &Person| &p.occupation, || aggregation::Count::new()),
25//! }
26//! );
27//!
28//! // insert items
29//! collection.insert(Person { name: "Alice".to_string(), age: 30, occupation: "Engineer".to_string() });
30//! collection.insert(Person { name: "Bob".to_string(), age: 25, occupation: "Designer".to_string() });
31//!
32//! // Query by name - returns keys, use get_one to get the first match
33//! let alice_key = collection.query(|ix| ix.by_name.get_one(&"Alice".to_string()));
34//!
35//! // Query oldest person
36//! let _oldest = collection.query(|ix| ix.by_age.max_one());
37//!
38//! // Query the count of designers
39//! let _designer_count = collection.query(|ix| ix.by_occupation.get(&"Designer".to_string()).count());
40//! ```
41//!
42//! # Motivation
43//!
44//! Rust standard library (and ecosystem) provides excellent in-memory data structures like `HashMap`,
45//! `BTreeMap`, `Vec`, and so on. However, if we need to index multiple fields of a data structure,
46//! we will need to build and maintain multiple data structures, and carefully write code to keep
47//! them in sync.
48//!
49//! `composable-indexes` aims to solve this problem by introducing a concept called [`Index`]es, which
50//! are composable data structures that are declared upfront and automatically kept in sync with the
51//! main collection. It's designed to be lightweight, easy to extend, and efficient.
52//!
53//! ## Collection
54//!
55//! At the core of `composable-indexes` is the [`Collection<T, Ix>`](Collection) type, which represents
56//! a collection of items of type `T`, indexed by a set of indexes `Ix`. A [`Collection`] owns the data,
57//! and the indexes hold pointers to the data in the collection. When we insert, update, or delete items
58//! in the collection, it automatically updates the indexes accordingly.
59//!
60//! When inserting an item into a [`Collection`](Collection), it returns a [`Key`](Key) which can be used to refer to
61//! the inserted item later (for updates or deletions) without resorting to a query.
62//!
63//! **Warning**: The collection can only guarantee the validity of the index as long as the items are not
64//! mutated in place. This should only be possible using interior mutability (e.g., `Cell`, `RefCell`,
65//! `Mutex`, atomics etc.). So - do not use interior mutability on the fields that are indexed. This can
66//! not only cause indexes to go out of sync, but more importantly might cause panics or undefined behaviour.
67//!
68//! But the more interesting part is querying. Methods like [Collection::query], [Collection::update] and
69//! [Collection::delete] methods accept a callback that receives a reference to the index structure, allowing us
70//! to perform efficient lookups using the indexes defined on the collection. There is also counterparts like
71//! [Collection::query_keys] that return the keys of the items matching the query, or [Collection::query_with_keys]
72//! that return both the keys and the items.
73//!
74//! Keep reading to learn more about indexes & querying!
75//!
76//! ## Indexes
77//!
78//! Each collection is attached to an an index. And index is a data structure that implements the [Index] trait -
79//! which the [Collection] uses to keep the index in sync with the data. The indexes can be accessed with the
80//! methods like [Collection::query] mentioned above, and expose methods to perform queries efficiently. The results
81//! of those queries are ordinary Rust data structures that implement [QueryResult] trait - which is then used by
82//! the collection to "translate" the result of the query to the actual data or perform updates/deletions.
83//!
84//! Most used indexes are [index::HashTable] and [index::BTree]:
85//!
86//! - [index::HashTable] wraps a [HashMap](hashbrown::HashMap) and provides efficient lookups by key. It's the most
87//! commonly used index for equality lookups. Answer queries like "get all the items with field X equal to Y".
88//! - [index::BTree] wraps a [BTreeMap](std::collections::BTreeMap) and provides efficient range queries (on top of
89//! equality lookups). Answer queries like "get all the items with field X is greater than A" or "get the item where the field
90//! X is the biggest").
91//!
92//! This brings us to the `composable` part of `composable-indexes`. This library contains "higher-order indexes" that
93//! can wrap other indexes to:
94//!
95//! - Apply the index to a specific field of the data ([index::Premap])
96//! - Apply the index to a subset of the data ([index::Filtered])
97//! - Group the data by a key and apply an index/aggregation to each group ([index::Grouped])
98//!
99//! On top of those - tuples also implement the [Index] trait, alllowing combining multiple indexes in parallel.
100//! Alternatively - the [Index] derive macro can be used to define tuple-like indexes with named fields.
101//!
102//! # Performance
103//!
104//! `composable-indexes` is designed with performance in mind. The interfaces are designed
105//! to compile away, and only expose the underlying data structures. In other words, think
106//! of a [`Collection`] as a way to translate operations to the underlying index structures
107//! at compile time without adding (significant) runtime overhead.
108//!
109//! Data structures in `composable-indexes` hold the data entirely in memory. Usually, the
110//! data is owned by the [`Collection`] itself, and indexes hold pointers to the data in the
111//! collection (called a [`Key`]). This means that for most queries you can expect one
112//! lookup to the index structure to obtain the pointer, and then one lookup to the
113//! collection to obtain the actual data.
114//!
115//! ## Index Performance
116//!
117//! The common indexes ([`BTree`](index::BTree), [`HashTable`](index::HashTable)) are simply thin wrappers around
118//! `std::collections::BTreeMap` and `std::collections::HashMap`, so you can expect the
119//! same performance characteristics as those data structures. They are keyed by the input
120//! (usually a field of the stored type) and values are sets of pointers to the actual
121//! data stored in the collection.
122//!
123//! Higher order indexes like [index::Filtered], [index::Premap] are all zero-cost abstractions and have
124//! negligible overhead.
125//!
126//! **Important**: Because of not doing bookkeeping themselves, the functions passed to
127//! higher-order indexes should be fast to compute, as they will not be cached and are
128//! computed on-the-fly. Ideally, they should be things like field accesses rather than
129//! expensive computations.
130//!
131//! The most commonly used indexes are [`HashTable`](index::HashTable) for equality lookups and [`BTree`](index::BTree) for
132//! range queries. Between those two, hashtables are the fastest. They also come with
133//! immutable counterparts (with the `imbl` feature) which tend to be slower, but allow
134//! cheap cloning and multi-versioning of the database.
135//!
136//! | Index Type | Operations | Insert | Remove | Query | Memory |
137//! |------------|------------|--------|--------|-------|--------|
138//! | [`index::Keys`] | get all keys | O(1) | O(1) | O(n) | O(n) |
139//! | [`index::HashTable`] | contains, get, count distinct | O(1) | O(1) | O(1) | O(n) |
140//! | [`index::BTree`] | contains, get, count distinct, range, min, max | O(log n) | O(log n) | O(log n) | O(n) |
141//! | [`index::SuffixTree`] | string search | O(k * log n) † | O(k * log n) † | O(log n) | O(n) ‡ |
142//! | Aggregations | count, sum, mean, stddev | O(1) | O(1) | O(1) | O(1) |
143//!
144//! † k = length of the string
145//!
146//! ‡ Suffix trees have have a high memory footprint (even though linear), expect 5-10 times the length of the input strings.
147//!
148//!
149//! ## Aggregation Performance
150//!
151//! All built-in aggregations are calculated iteratively, without holding the data in
152//! memory. You can expect O(1) memory and time complexity regardless of the size of the
153//! collection.
154//!
155//! As an example, [`aggregations::Count`](aggregation::Count) simply increments and decrements a counter as
156//! items are inserted and removed, [`aggregations::Mean`](aggregation::Mean) only keeps track of the sum and
157//! count and so on.
158//!
159//! ## Indexing Overhead
160//!
161//! A [`Collection`] is simply a `HashMap`, and indexes are additional data structures.
162//! Hence, inserting an element into a [`Collection`] simply compiles down to inserting
163//! the element into the underlying `HashMap`, and then inserting pointers to the same
164//! element into each of the indexes. Hence, the overhead of adding indexes is linear
165//! in the number of indexes.
166//!
167//! As an example, in the benchmark below we compare inserting elements into a
168//! `std::collections::HashMap` versus inserting the same elements into a
169//! [Collection] with zero, one, two, three, and four indexes.
170//! You can see that without an index, the performance is exactly the same as a
171//! `HashMap`, and adding an index linearly increases the insertion time.
172//!
173#![cfg_attr(doc, doc=include_str!("../doc_assets/bench_indexing_overhead.svg"))]
174//!
175//! # Security
176//!
177//! As both [`Collection`] and [`index::HashTable`] index are backed by hash maps, the choice of the
178//! hash function can have a significant impact on performance. `composable-indexes`
179//! defaults to the default hasher of `hashbrown`, which is `foldhash` that is fast,
180//! but prone to HashDoS attacks. If you need a different
181//! hasher, they can be provided when constructing collection and the indexes. See `foldhash`'s
182//! README for more details: <https://github.com/orlp/foldhash>
183//!
184
185#![cfg_attr(all(not(feature = "std"), not(feature = "testutils")), no_std)]
186
187extern crate alloc;
188
189pub mod core;
190
191#[doc(inline)]
192pub use core::{Collection, Index, Key, QueryResult, QueryResultDistinct, ShallowClone};
193
194pub mod aggregation;
195pub mod index;
196
197#[cfg(feature = "testutils")]
198pub mod testutils;
199
200#[cfg(feature = "derive")]
201pub use composable_indexes_derive::{Index, ShallowClone};
202
203// Some tests for the Collection functionality are defined
204// here so we can utilize the testutils crate.
205#[cfg(test)]
206mod test {
207 use crate::core::*;
208 use crate::testutils::test_index;
209
210 macro_rules! op_insert {
211 ($key:expr, $new:expr) => {
212 $crate::testutils::Op::Insert($crate::testutils::Insert_ {
213 key: Key::unsafe_from_u64($key),
214 new: $new,
215 })
216 };
217 }
218
219 macro_rules! op_update {
220 ($key:expr, $existing:expr, $new:expr) => {
221 $crate::testutils::Op::Update($crate::testutils::Update_ {
222 key: Key::unsafe_from_u64($key),
223 new: $new,
224 existing: $existing,
225 })
226 };
227 }
228
229 macro_rules! op_remove {
230 ($key:expr, $existing:expr) => {
231 $crate::testutils::Op::Remove($crate::testutils::Remove_ {
232 key: Key::unsafe_from_u64($key),
233 existing: $existing,
234 })
235 };
236 }
237
238 #[test]
239 fn simple() {
240 let mut db = Collection::<u32, _>::new(test_index());
241
242 let one = db.insert(1);
243 let two = db.insert(2);
244 let three = db.insert(3);
245 db.update_by_key(two, |_| 10);
246 let four = db.insert(4);
247 db.delete_by_key(three);
248
249 assert_eq!(db.get_by_key(one), Some(&1));
250 assert_eq!(db.get_by_key(two), Some(&10));
251 assert_eq!(db.get_by_key(three), None);
252 assert_eq!(db.get_by_key(four), Some(&4));
253 assert_eq!(db.len(), 3);
254
255 // Access test index operations directly
256 let ops = db.query(|ix| Plain(ix.ops.clone()));
257 assert_eq!(
258 ops,
259 vec![
260 op_insert!(0, 1),
261 op_insert!(1, 2),
262 op_insert!(2, 3),
263 op_update!(1, 2, 10),
264 op_insert!(3, 4),
265 op_remove!(2, 3),
266 ]
267 );
268 }
269
270 #[test]
271 fn update_mut_updates() {
272 let mut db = Collection::<u32, _>::new(test_index());
273
274 let one = db.insert(1);
275 db.update_by_key_mut(one, |v| {
276 if let Some(v) = v {
277 *v += 1;
278 }
279 });
280
281 assert_eq!(db.get_by_key(one), Some(&2));
282 assert_eq!(db.len(), 1);
283 let ops = db.query(|ix| Plain(ix.ops.clone()));
284 assert_eq!(
285 ops,
286 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
287 );
288 }
289
290 #[test]
291 fn update_mut_inserts() {
292 let mut db = Collection::<u32, _>::new(test_index());
293
294 let one = db.insert(1);
295 db.delete_by_key(one);
296 db.update_by_key_mut(one, |v| {
297 assert!(v.is_none());
298 *v = Some(2);
299 });
300
301 assert_eq!(db.get_by_key(one), Some(&2));
302 assert_eq!(db.len(), 1);
303 let ops = db.query(|ix| Plain(ix.ops.clone()));
304 assert_eq!(
305 ops,
306 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
307 );
308 }
309
310 #[test]
311 fn update_mut_removes() {
312 let mut db = Collection::<u32, _>::new(test_index());
313
314 let one = db.insert(1);
315 db.update_by_key_mut(one, |v| {
316 assert!(v.is_some());
317 *v = None;
318 });
319
320 assert_eq!(db.get_by_key(one), None);
321 assert_eq!(db.len(), 0);
322 let ops = db.query(|ix| Plain(ix.ops.clone()));
323 assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
324 }
325
326 #[test]
327 fn update_updates() {
328 let mut db = Collection::<u32, _>::new(test_index());
329
330 let one = db.insert(1);
331 db.update_by_key(one, |_| 2);
332
333 assert_eq!(db.get_by_key(one), Some(&2));
334 assert_eq!(db.len(), 1);
335 let ops = db.query(|ix| Plain(ix.ops.clone()));
336 assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
337 }
338
339 #[test]
340 fn update_inserts() {
341 let mut db = Collection::<u32, _>::new(test_index());
342
343 let one = db.insert(1);
344 db.delete_by_key(one);
345
346 db.update_by_key(one, |x| {
347 assert_eq!(x, None);
348 2
349 });
350
351 assert_eq!(db.get_by_key(one), Some(&2));
352 assert_eq!(db.len(), 1);
353 let ops = db.query(|ix| Plain(ix.ops.clone()));
354 assert_eq!(
355 ops,
356 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
357 );
358 }
359
360 #[test]
361 fn adjust_mut_updates() {
362 let mut db = Collection::<u32, _>::new(test_index());
363
364 let one = db.insert(1);
365 db.adjust_by_key_mut(one, |v| {
366 *v = 2;
367 });
368
369 assert_eq!(db.get_by_key(one), Some(&2));
370 assert_eq!(db.len(), 1);
371 let ops = db.query(|ix| Plain(ix.ops.clone()));
372 assert_eq!(
373 ops,
374 vec![op_insert!(0, 1), op_remove!(0, 1), op_insert!(0, 2),]
375 );
376 }
377
378 #[test]
379 fn adjust_mut_ignores_non_existent() {
380 let mut db = Collection::<u32, _>::new(test_index());
381
382 let one = db.insert(1);
383 db.delete_by_key(one);
384
385 db.adjust_by_key_mut(one, |_| {
386 panic!("Should not be called");
387 });
388
389 assert_eq!(db.get_by_key(one), None);
390 assert_eq!(db.len(), 0);
391 let ops = db.query(|ix| Plain(ix.ops.clone()));
392 assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
393 }
394
395 #[test]
396 fn adjust_updates() {
397 let mut db = Collection::<u32, _>::new(test_index());
398
399 let one = db.insert(1);
400 db.adjust_by_key(one, |_| 2);
401
402 assert_eq!(db.get_by_key(one), Some(&2));
403 assert_eq!(db.len(), 1);
404 let ops = db.query(|ix| Plain(ix.ops.clone()));
405 assert_eq!(ops, vec![op_insert!(0, 1), op_update!(0, 1, 2),]);
406 }
407
408 #[test]
409 fn adjust_ignores_non_existent() {
410 let mut db = Collection::<u32, _>::new(test_index());
411
412 let one = db.insert(1);
413 db.delete_by_key(one);
414
415 db.adjust_by_key(one, |_| 2);
416
417 assert_eq!(db.get_by_key(one), None);
418 assert_eq!(db.len(), 0);
419 let ops = db.query(|ix| Plain(ix.ops.clone()));
420 assert_eq!(ops, vec![op_insert!(0, 1), op_remove!(0, 1),]);
421 }
422}