# indexset
[](https://crates.io/crates/indexset)
[](https://docs.rs/indexset)
A pure-Rust two-level dynamic order-statistic b-tree.
This crate implements a compact set data structure that preserves its elements' sorted order and
allows lookups of entries by value or sorted order position.
Under the feature `concurrent` you can find a version of the BTree that can be fearlessly shared between
threads.
Both the concurrent and single-threaded versions are meant to be drop-in replacements for the stdlib BTree. This
is mostly true for the latter but not for the former, yet.
# Background
This was heavily inspired by [`indexmap`](https://crates.io/crates/indexmap), and
python's [`sortedcontainers`](https://github.com/grantjenks/python-sortedcontainers).
It differs from both in that:
* `indexmap` is a hashmap that provides numerical lookups, but does not maintain order in case of removals, while
`indexset` is a b-tree that irrespective of which mutating operation is run, always maintains order.
* `sortecontainers` is similar in spirit, but utilizes a different routine for balancing the tree, and relies
on a heap for numerical lookups.
`indexset` provides the following features:
- As fast to iterate as a vec.
- Zero indirection.
- Lookups by position and range.
- Minimal amount of allocations.
- `select`(lookups by position) and `rank` operations in near constant time.
# Performance
`BTreeSet` and `BTreeMap` derive their performance much from how they are constructed, which is:
> A two-level B-Tree with a fenwick tree as a low-cost index for numerical lookups
Each node is a leaf, and each leaf is a vec with a fixed capacity of size `B`, with `1024` being the default.
The following hold:
- Iteration is very fast since it is done by inheriting vec's iter struct.
- Lookups only need two binary searches. One over `n/B` nodes and another over `B` elements: `O(log(n/B) + log(B)) = O(log(n))`.
- Insertions are constant time `O(B)` in the best case and `O(B^2)` in the worst. Removals are `O(log(n))`.
## Benchmarks
The following numbers were obtained on a M3 macbook pro:
### Single-threaded
Command: `cargo bench --bench stdlib --all-features`
* Inserting 100k random usize
* `stdlib::collections::BTreeSet.insert(i)`: 8.9ms
* `indexset::BTreeSet.insert(i)`: 13.1ms
* `indexset::concurrent::set::BTreeSet.insert(i)`: 14.0ms
* Checking that each 100k random usize integers exist
* `stdlib::collections::BTreeSet.contains(i)`: 7.02ms
* `indexset::BTreeSet.contains(i)`: 5.22ms
* `indexset::concurrent::set::BTreeSet.contains(i)`: 5.27ms
* Getting all 100k i-th elements
* `stdlib::collections::BTreeSet.iter.nth(i)`: **13.28s** yes, seconds!
* `indexset::BTreeSet.get_index(i)`: **3.93ms**
* Iterating over all 100k elements and then collecting it into a vec
* `Vec::from_iter(stdlib::collections::BTreeSet.iter())`: **227.28us**
* `Vec::from_iter(indexset::BTreeSet.iter())`: **123.21.us**
Getting the i-th element is **3400x** faster than stdlib's btree, `contains` is 25% faster, and iterating is twice
as fast, at the cost of insertions being 30% slower.
If your use case of `std::collections::BTreeSet` and `BTreeMap` i read-heavy, or if you really need to index by
sorted-order position, it might be worth checking out `indexset` instead.
### Concurrent
Command: `cargo bench --bench concurrent --all-features`
We benchmark concurrent operations with 40 threads, each conducting 100000 operations
at the same time that vary from a ratio of 1% writes/reads to 50% writes/reads.
In this benchmark threads have high locality and tend to focus on specific parts of the data.
* 1% writes/99% reads:
* `indexset::concurrent::set::BTreeSet`: 170.5ms
* `scc::TreeIndex`: 128.7ms
* `crossbeam_skiplist::SkipSet`: 161.3ms
* 10% writes/90% reads:
* `indexset::concurrent::set::BTreeSet`: 175.9ms
* `scc::TreeIndex`: 183.9ms
* `crossbeam_skiplist::SkipSet`: 217.4ms
* 30% writes/70% reads:
* `indexset::concurrent::set::BTreeSet`: 190.9ms
* `scc::TreeIndex`: 313.2ms
* `crossbeam_skiplist::SkipSet`: 261.8ms
* 50% writes/50% reads:
* `indexset::concurrent::set::BTreeSet`: 220.75ms
* `scc::TreeIndex`: 561.70ms
* `crossbeam_skiplist::SkipSet`: 334.40ms
## Limitations
* `BTreeMap` is less polished than `BTreeSet`. This crate has been optimised for a leaner `BTreeSet`.
* `Concurrent` `BTreeMap` and `BtreeSet` do not support `serde` serialization and deserialization nor are they order-statistic trees.
## Naming
This library is called `indexset`, because the base data structure is `BTreeSet`. `BTreeMap` is a `BTreeSet` with
a `Pair<K, V>` item type.
## Changelog
See [CHANGELOG.md](https://github.com/brurucy/indexset/blob/master/CHANGELOG.md).