Struct tinyset::setu64::SetU64

source · [−]

pub struct SetU64(_);

Expand description

A set of u64

Implementation

The implementation and size of a SetU64 is an internal detail that is not stable, but may guide your use. It is optimized for size, while maintaining the scaling of a hashmap.

The implementation is designed for the use case of storing indexes into a Vec. This use case tends to involve small integers (they must be less than its len), which could include any number of such integers. They also have a greater than average likelihood of including sequential or close integers, particularly if values are pushed to the Vec while their indexes are added to sets.

Small sets

Very sets of up to seven small numbers are stored on the stack in a single tagged pointer. This is 8 bytes on a 64-bit system, and 4 bytes on a 32-bit system. On a 32-bit system (which I won’t discuss further here, look in the code!) the elemets must be smaller in order to be stored without allocation.

Sets stored in a single word have 3 bits dedicated to the number of elements in the set, with the remaining bits used to represent the value of the smallest element in the set, followed by the differences between subsequent elements in the ordered set. Twice as many bits (rounded up) are dedicated to the fist element as to the differences.

On a 64-bit system, this works out to…

We can store a set with a single integer less than about 10¹⁸. Thus we should be able to store just about any zero-element or one-element values on the stack.
We can store two values on the stack, provided the lesser is less than 10¹², and the difference between them is less than a million (10⁶). This starkly highlights the optimization for closely spaced numbers. It is also tweakable in the code, and could be changed.
We can store three values on the stack, if the first is less than about 3×10⁷ (30 million), and the following two values have gaps of less than 4096.
…
To hold 7 values in 64 bits, we limit the first value to about 500 thousand, and the following differences must be less than 128. So your seven numbers will seriously need to be either all quite small or quite closely packed.

Larger sets

Larger sets are stored on the heap. We currently have three different heap formats, which will be chosen based on the distribution of your values. The format is only changed when reallocation is required, so it may be challenging to predict the format of a given set, particularly as the reallocation size is randomized in order to mitigate the risk of hash collision attacks. The three formats are:

Internal::Dense The set is stored as a bitmap up to a maximum value. This format is chosen when the number of elements exceeds 1/127 of the maximum value (see the implementation of SetU64::with_capacity_and_max), which means that less than a byte will be used per element.
Internal::Heap The set is stored as a kind of Robin Hood hash map (without hashing!) from most significant bits to bitmaps holding sets of the stored least significant bits.
Internal::Big An ordinary Robin Hood hash set (without hashing!), with a dynamic sentinal value indicating that a bucket is empty. This is used only when the maximum value in the set is very large.

`Internal::Heap` format

I will describe here some details of the Internal::Heap format, which is likely the most common one, and is definitely the most complicated. The data is stored in an array of u64 buckets. Based on the largest element present, we divide the bits of each of those elements into a key and a bitmap. The array is a Robin Hood hashmap between keys and bitmaps. The key represents the most significant value of the elements stored, and the bitmap stores the set of values which have that same most significant value.

As an example, consider a set with a maximum of 5000. This maximum requires 13 bits to represent, but we don’t need 13 bits to store the key, because that would leave 51 bits for the bitmap, so each bucket would hold 51 possible values, enabling us to store values of up to 51*(1 << 13). So instead we could use just 7 bits to hold the keys, leaving 57 bits per bucket, which would thus enable us to store values up to about 54 thousand. Which bucket size we use will depend on the order in which elements were added, since we only reallocate when needed either because we have an element that we cannot fit, or because our hashmap is too full. When allocating, we tend to leave room for the maximum to increase.

Assuming we use 13 bits per bucket, then let’s talk through the process of inserting the value 137. Since we have 51 elements per bucket, the key is found by dividing by 51, which gives us a key of 137/51 = 2. We then look up the bucket with key 2 using a pretty standard Robin Hood algorithm (except that the keys are a portion of a word). Once we have that bucket, we will identify that the bit corresponding to our value is bit number 137 % 51, which we will set (and also check the value of, to track the number of elements in the set) and determine the return value.

This format allows us to efficiently store sets in which there are contiguous chunks of elements, and when the elements are widely spaced it at least takes no more than 64 bits per 64-bit value, plus hash-set overhead. Its complexity also makes it significantly slower for insertion (or collect()) than a standard HashSet, but it also can take considerably less space.

Struct tinyset::setu64::SetU64

Implementations

impl SetU64

pub fn is_empty(&self) -> bool

impl SetU64

pub fn with_capacity_of(other: &Self) -> Self

impl SetU64

pub fn len(&self) -> usize

pub fn capacity(&self) -> usize

pub fn debug_me(&self, msg: &str)

pub fn mem_used(&self) -> usize

pub fn with_capacity_and_max(cap: usize, mx: u64) -> SetU64

pub fn with_capacity_and_bits(cap: usize, bits: u64) -> SetU64

pub const fn new() -> Self

pub fn insert(&mut self, e: u64) -> bool

pub fn remove(&mut self, e: u64) -> bool

pub fn contains(&self, e: u64) -> bool

pub fn iter<'a>(&'a self) -> impl Iterator<Item = u64> + 'a + Debug

pub fn drain<'a>(&'a mut self) -> impl Iterator<Item = u64> + 'a

Trait Implementations

impl<'a, 'b> BitOr<&'b SetU64> for &'a SetU64

fn bitor(self, rhs: &SetU64) -> SetU64

type Output = SetU64

impl<'b> BitOr<&'b SetU64> for SetU64

fn bitor(self, rhs: &SetU64) -> SetU64

type Output = SetU64

impl Clone for SetU64

fn clone(&self) -> Self

fn clone_from(&mut self, source: &Self)

impl Debug for SetU64

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Default for SetU64

fn default() -> Self

impl Drop for SetU64

fn drop(&mut self)

impl Extend<u64> for SetU64

fn extend<T: IntoIterator<Item = u64>>(&mut self, iter: T)

fn extend_one(&mut self, item: A)

fn extend_reserve(&mut self, additional: usize)

impl FromIterator<u64> for SetU64

fn from_iter<T>(iter: T) -> Selfwhere T: IntoIterator<Item = u64>,

impl IntoIterator for SetU64

type Item = u64

type IntoIter = IntoIter

fn into_iter(self) -> IntoIterⓘNotable traits for IntoIterimpl Iterator for IntoIter type Item = u64;

impl PartialEq<SetU64> for SetU64

fn eq(&self, other: &Self) -> bool

fn ne(&self, other: &Rhs) -> bool

impl<'a, 'b> Sub<&'b SetU64> for &'a SetU64

fn sub(self, rhs: &SetU64) -> SetU64

type Output = SetU64

impl<'b> Sub<&'b SetU64> for SetU64

fn sub(self, rhs: &SetU64) -> SetU64

type Output = SetU64

impl Eq for SetU64

impl Send for SetU64

impl Sync for SetU64

Auto Trait Implementations

impl RefUnwindSafe for SetU64

impl Unpin for SetU64

impl UnwindSafe for SetU64

Blanket Implementations

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

fn from_iter<T>(iter: T) -> Selfwhere
T: IntoIterator<Item = u64>,

fn into_iter(self) -> IntoIterⓘNotable traits for IntoIter`impl Iterator for IntoIter type Item = u64;`

impl<T> Any for Twhere
T: 'static + ?Sized,

impl<T> Borrow<T> for Twhere
T: ?Sized,

impl<T> BorrowMut<T> for Twhere
T: ?Sized,

impl<T, U> Into<U> for Twhere
U: From<T>,

impl<T> ToOwned for Twhere
T: Clone,

impl<T, U> TryFrom<U> for Twhere
U: Into<T>,

impl<T, U> TryInto<U> for Twhere
U: TryFrom<T>,

impl<V, T> VZip<V> for Twhere
V: MultiLane<T>,