# numerical-multiset: An ordered multiset of machine numbers
[](./LICENSE)
[](https://crates.io/crates/numerical-multiset)
[](https://docs.rs/numerical-multiset/)
[](https://github.com/HadrienG2/numerical-multiset/actions?query=workflow%3A%22Continuous+Integration%22)

## What is this?
Let's break down the title of this README into more digestible chunks:
- **Multiset:** The `NumericalMultiset` container provided by this crate
implements a generalization of the mathematical notion of set, called a
multiset. Unlike a set, a multiset can simultaneously hold multiple elements
that compare equal to each other.
- **Ordered:** Most multiset implementations are based on associative
containers. In a crates.io survey performed at the time of writing, the most
popular choice was hash maps, which do not expose any meaningful element
ordering:
- Any hash map insertion can change the order of elements that is exposed by
iterators.
- There is no efficient way to find what is e.g. the smallest element
without iterating over all elements.
In contrast, this multiset implementation is based on an ordered associative
container (a
[`BTreeMap`](https://doc.rust-lang.org/std/collections/struct.BTreeMap.html)
at the time of writing). This allows it to efficiently answer order-related
queries, like in-order iteration over elements or extraction of the
minimum/maximum element. The price to pay for these fast ordering queries is
that the algorithmic complexity of order-unrelated multiset operations like
insertions are deletions will scale less well as the number of elements grows.
- **Numbers:** In Rust terms, a general-purpose multiset is typically based on a
data structure of the form `Map<Element, Collection<Element>>` where `Map` is
an associative container and `Collection` groups together multiset elements
that share a common comparison key. However, the multiset provided by this
crate is not general-purpose, but specialized for machine number types (`u32`,
`f32`...) and newtypes thereof. These number types share a few properties
which this crate leverages to improve ergonomics, memory usage and execution
performance:
* The equality operator of machine numbers is defined such that if two
numbers compare equal, they are the same number, i.e. there is no hidden
internal information that their `PartialEq` implementation does not look
at. Storing a collection of identical values like a general-purpose
multiset does is thus pointless in our case, instead we can simply count
the number of occurences of each value, leading to a more efficient sparse
histogram data structure of the form `Map<Element, NonZeroUsize>`.
* Machine numbers all implement the `Copy` trait, which means that we do not
need the complexity of the standard library's associative container APIs
(with reference accessors and nontrivial use of the `Borrow` trait) and
can instead provide simpler and slightly more efficient value-based APIs.
## How is this useful?
This crate was initially written while implementing windowed signal processing
algorithms where one is receiving an stream of numbers, and for each new input
beyond the first few ones, one needs to answer a question about the last N
numbers that were received, such that the naive algorithm to answer the question
involves maintaining a sorted list of previous data points.
For example, a median filter can be efficiently implemented by maintaining two
`NumericalMultiset`s, one representing numbers below the current median and one
representing numbers above the current median. This is effectively just a sparse
variation of the classic dense histogramming algorithm that is commonly used
when processing 8-bit grayscale images, but sadly does not scale to
higher-resolution input data (>= 32-bit) due to excessive memory usage and poor
cache locality.
## License
This crate is distributed under the terms of the MPLv2 license. See the LICENSE
file for details.
More relaxed licensing (Apache, MIT, BSD...) may also be negociated, in exchange
of a financial contribution. Contact me for details at knights_of_ni AT gmx
DOTCOM.