numerical-multiset 1.0.0

An ordered multiset of machine numbers
Documentation
  • Coverage
  • 100%
    36 out of 36 items documented33 out of 35 items with examples
  • Size
  • Source code size: 89.29 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 3.75 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 7s Average build duration of successful builds.
  • all releases: 10s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • HadrienG2/numerical-multiset
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • HadrienG2

numerical-multiset: An ordered multiset of machine numbers

MPLv2 licensed On crates.io On docs.rs Continuous Integration Requires rustc 1.85.0+

What is this?

Let's break down the title of this README into more digestible chunks:

  • Multiset: The NumericalMultiset container provided by this crate implements a generalization of the mathematical notion of set, called a multiset. Unlike a set, a multiset can simultaneously hold multiple elements that compare equal to each other.

  • Ordered: Most multiset implementations are based on associative containers. In a crates.io survey performed at the time of writing, the most popular choice was hash maps, which do not expose any meaningful element ordering:

    • Any hash map insertion can change the order of elements that is exposed by iterators.
    • There is no efficient way to find what is e.g. the smallest element without iterating over all elements.

    In contrast, this multiset implementation is based on an ordered associative container (a BTreeMap at the time of writing). This allows it to efficiently answer order-related queries, like in-order iteration over elements or extraction of the minimum/maximum element. The price to pay for these fast ordering queries is that the algorithmic complexity of order-unrelated multiset operations like insertions are deletions will scale less well as the number of elements grows.

  • Numbers: In Rust terms, a general-purpose multiset is typically based on a data structure of the form Map<Element, Collection<Element>> where Map is an associative container and Collection groups together multiset elements that share a common comparison key. However, the multiset provided by this crate is not general-purpose, but specialized for machine number types (u32, f32...) and newtypes thereof. These number types share a few properties which this crate leverages to improve ergonomics, memory usage and execution performance:

    • The equality operator of machine numbers is defined such that if two numbers compare equal, they are the same number, i.e. there is no hidden internal information that their PartialEq implementation does not look at. Storing a collection of identical values like a general-purpose multiset does is thus pointless in our case, instead we can simply count the number of occurences of each value, leading to a more efficient sparse histogram data structure of the form Map<Element, NonZeroUsize>.
    • Machine numbers all implement the Copy trait, which means that we do not need the complexity of the standard library's associative container APIs (with reference accessors and nontrivial use of the Borrow trait) and can instead provide simpler and slightly more efficient value-based APIs.

How is this useful?

This crate was initially written while implementing windowed signal processing algorithms where one is receiving an stream of numbers, and for each new input beyond the first few ones, one needs to answer a question about the last N numbers that were received, such that the naive algorithm to answer the question involves maintaining a sorted list of previous data points.

For example, a median filter can be efficiently implemented by maintaining two NumericalMultisets, one representing numbers below the current median and one representing numbers above the current median. This is effectively just a sparse variation of the classic dense histogramming algorithm that is commonly used when processing 8-bit grayscale images, but sadly does not scale to higher-resolution input data (>= 32-bit) due to excessive memory usage and poor cache locality.

License

This crate is distributed under the terms of the MPLv2 license. See the LICENSE file for details.

More relaxed licensing (Apache, MIT, BSD...) may also be negociated, in exchange of a financial contribution. Contact me for details at knights_of_ni AT gmx DOTCOM.