Crate kempt

source ·
Expand description

§Kempt

A #[forbid_unsafe] ordered collection crate for Rust. This crate is no_std compatible using the alloc crate.

crate version Live Build Status HTML Coverage Report for main branch Documentation for v0.2.4 branch

§Map<K, V>

Map<K,V> provides an interface similar to a BTreeMap<K,V>, but utilizes a simpler storage model. The entries are stored in a Vec, ordered by the keys. Retrieving values uses a hybrid binary search and sequential scan algorithm that is aimed at taking advantage of sequential scans for better cache performance.

use kempt::Map;

let mut map = Map::new();
map.insert("a", 1);
map.insert("b", 2);
assert_eq!(map.get(&"a"), Some(&1));
let replaced = map.insert("a", 2).expect("value exists");
assert_eq!(map.get(&"a"), Some(&2));
assert_eq!(replaced.value, 1);

§Why use Map instead of BTreeMap or HashMap?

The Map type provides several operations that the standard library Map types do not:

  • Ability to merge maps (merge_with())
  • Entry API that supports owned or borrowed representations, and only uses ToOwned when inserting borrowed key into a vacant entry
  • Ability to access fields by index in addition to the key type

Overall, the Map type is very similar to the BTreeMap type, except that it utilizes a single storage buffer. Because of this simplified storage model, the Map type supports preallocation with_capacity(), while BTreeMap does not.

The Map type will not perform as well as BTreeMap when there is a significant number of items in the collection (> 1k, depending on Key and Value sizes). Some operations, such as insertion and removal, will also be slower on moderately large collections (> 100 entries).

The Map type can be beneficial over using a HashMap for several reasons:

  • Ordered iteration
  • Only requires Ord
  • Growing does not require “rebinning”
  • May be faster for collections with fewer than ~100 elements (depending on Key and Value sizes), due to:
    • No hashing of keys
    • When hash maps are more full, the likelihood of collisions increases. Collisions require some sort of scan to find the matching key, and each key comparison falls back to Eq.

The HashMap type is more beneficial for many other use cases:

  • Large Key types that are expensive to compare
  • Moderately large collections (> 100 entries, depending on Key and Value sizes)

§Set<T>

Set<T> provides an interface similar to a BTreeSet<T>, but utilizes a simpler storage model by leveraging this crate’s Map type. The members are stored in a Vec in sort order. Retrieving values uses a hybrid binary search and sequential scan algorithm that is aimed at taking advantage of sequential scans for better cache performance.

use kempt::Set;

let mut set = Set::new();
set.insert(42);
assert!(set.contains(&42));
// Inserting again will return false, because the member already exists.
assert!(!set.insert(42));
// The collection will be automatically sorted as it is mutated.
set.insert(1);
assert_eq!(set.member(0), Some(&1));
assert_eq!(set.member(1), Some(&42));

§Why?

This crate started as a thought experiment: when using interned strings, could an ordered Vec outperform a HashMap for “small” collections. And, if it could, how would it compare to BTreeMap? The use case being considered was stylecs, where most collections were expected to be well under 100 elements.

In addition to performing similarly or better than HashMap and BTreeMap for the intended use cases, a generic merge_with API was designed that optimally merges the contents of two collections together. This operation is expected to be a common operation for stylecs, and there is no optimal way to write such an algorithm with HashMap or BTreeMap.

This collection is not always better than HashMap and/or BTreeMap, and depending on your Key and Value types, your mileage may vary. Before using this in your own project, you should benchmark your intended use case to ensure it meets your performance needs.

§Benchmarks

The benchmark suite can be run via cargo:

cargo bench -p benchmarks

The suite uses randomization, which means that each run may produce slightly different results. However, each data structure is tested with the same randomized data, so each individual run of a benchmark is a true comparison. The randomization can be provided a stable seed to allow comparing the same data sets by providing the -s option:

# -s takes up to 64 hexadecimal digits to produce the random seed
cargo bench -p benchmarks -- -s0

Full results as run on Github Actions can be viewed here, using a randomized seed. Here are the results run from the developer’s machine (an AMD Ryzen 7 3800X) for data sets containing 25 elements using -s0:

BenchmarkKey TypeMapHashMap (fnv)BTreeMap
fill randomu8161.1ns408.9ns447.5ns
fill randomusize157.7ns289.2ns451.5ns
fill randomu128260.1ns401.0ns527.8ns
fill sequ8170.4ns413.0ns444.9ns
fill sequsize173.3ns284.2ns461.2ns
fill sequ128220.5ns404.9ns533.1ns
getu85.0ns4.1ns5.5ns
getusize5.1ns6.4ns5.7ns
getu1287.3ns11.7ns6.9ns
removeu819.4ns25.6ns187.1ns
removeusize28.2ns34.6ns199.8ns
removeu12832.2ns41.2ns203.8ns

No benchmark suite is ever perfect. Pull requests are welcome. Each potential user who cares about maximal performance should benchmark their own use case on their target hardware rather than rely on these benchmark results.

§Open-source Licenses

This project, like all projects from Khonsu Labs, is open-source. This repository is available under the MIT License or the Apache License 2.0.

To learn more about contributing, please see CONTRIBUTING.md.

Re-exports§

  • pub use map::Map;
  • pub use set::Set;

Modules§

Traits§

  • Provides a comparison between Self and Other.