Expand description
An ordered multiset implementation for primitive number types based on sparse histograms.
This crate implements a kind of multiset, which is a generalization of the notion of mathematical set where multiple elements that are equal to each other can be present simulatneously.
Our multiset implementation differs from popular multiset implementations of crates.io at the time of its release in the following respects:
- It is ordered, which means that order-based queries such as getting a
sorted list of elements or finding the minimum and maximum elements are
relatively cheap. For example, the complexity of a min/max query grows
logarithmically with the number of distinct values in the multiset.
- The price to pay for this ordering is that classic set operations like inserting/removing elements or querying whether an element is present will scale less well to larger datasets than in the more common hash-based multiset implementations: element-wise operations also have logarithmic complexity, whereas a hash-based multiset can instead achieve a constant-time operation complexity that’s independent of the collection size.
- It is specialized for primitive number types and
repr(transparent)wrappers thereof, which means that it can leverage the property of these numbers to improve ergonomics and compute/memory efficiency:- Since all primitive number types are
Copy, we do not need to bother with references andBorrowtrait complexity like general-purpose map and set implementations do, and can instead provide a simpler value-based API. - Since all primitive number types have well-behaved
Eqimplementations where numbers that compare equal are identical, we do not need to track lists of equal entries like many multiset implementations do, and can instead use a more efficient sparse histogramming approach where we simply count the number of equal entries.
- Since all primitive number types are
One example application of this multiset implementation would be median filtering of streaming numerical data whose bit width is too large for the classic dense histogram approach to be applicable, like floats and integers of width >= 32 bits.
To use this crate with floating-point data, you will need to use one of the
available Ord float wrapper that assert absence of NaNs, such as the
NotNan
type from the ordered_float crate. We do
not handle this concern for you because checking for NaN has a cost and we
believe this cost is best paid once on your side and hopefully amortized
across many reuses of the resulting wrapper, rather than repeatedly paid
every time an element is inserted into a NumericalMultiset.
Structs§
- Into
Iter - An owning iterator over the entries of an
NumericalMultiset, sorted by value. - Iter
- An iterator over the entries of an
NumericalMultiset, sorted by value. - Numerical
Multiset - An ordered multiset implementation for primitive number types based on sparse histograms.