pub struct SetU64(_);
Expand description

A set of u64

Implementation

The implementation and size of a SetU64 is an internal detail that is not stable, but may guide your use. It is optimized for size, while maintaining the scaling of a hashmap.

The implementation is designed for the use case of storing indexes into a Vec. This use case tends to involve small integers (they must be less than its len), which could include any number of such integers. They also have a greater than average likelihood of including sequential or close integers, particularly if values are pushed to the Vec while their indexes are added to sets.

Small sets

Very sets of up to seven small numbers are stored on the stack in a single tagged pointer. This is 8 bytes on a 64-bit system, and 4 bytes on a 32-bit system. On a 32-bit system (which I won’t discuss further here, look in the code!) the elemets must be smaller in order to be stored without allocation.

Sets stored in a single word have 3 bits dedicated to the number of elements in the set, with the remaining bits used to represent the value of the smallest element in the set, followed by the differences between subsequent elements in the ordered set. Twice as many bits (rounded up) are dedicated to the fist element as to the differences.

On a 64-bit system, this works out to…

  • We can store a set with a single integer less than about 1018. Thus we should be able to store just about any zero-element or one-element values on the stack.
  • We can store two values on the stack, provided the lesser is less than 1012, and the difference between them is less than a million (106). This starkly highlights the optimization for closely spaced numbers. It is also tweakable in the code, and could be changed.
  • We can store three values on the stack, if the first is less than about 3×107 (30 million), and the following two values have gaps of less than 4096.
  • To hold 7 values in 64 bits, we limit the first value to about 500 thousand, and the following differences must be less than 128. So your seven numbers will seriously need to be either all quite small or quite closely packed.

Larger sets

Larger sets are stored on the heap. We currently have three different heap formats, which will be chosen based on the distribution of your values. The format is only changed when reallocation is required, so it may be challenging to predict the format of a given set, particularly as the reallocation size is randomized in order to mitigate the risk of hash collision attacks. The three formats are:

  1. Internal::Dense The set is stored as a bitmap up to a maximum value. This format is chosen when the number of elements exceeds 1/127 of the maximum value (see the implementation of SetU64::with_capacity_and_max), which means that less than a byte will be used per element.
  2. Internal::Heap The set is stored as a kind of Robin Hood hash map (without hashing!) from most significant bits to bitmaps holding sets of the stored least significant bits.
  3. Internal::Big An ordinary Robin Hood hash set (without hashing!), with a dynamic sentinal value indicating that a bucket is empty. This is used only when the maximum value in the set is very large.
Internal::Heap format

I will describe here some details of the Internal::Heap format, which is likely the most common one, and is definitely the most complicated. The data is stored in an array of u64 buckets. Based on the largest element present, we divide the bits of each of those elements into a key and a bitmap. The array is a Robin Hood hashmap between keys and bitmaps. The key represents the most significant value of the elements stored, and the bitmap stores the set of values which have that same most significant value.

As an example, consider a set with a maximum of 5000. This maximum requires 13 bits to represent, but we don’t need 13 bits to store the key, because that would leave 51 bits for the bitmap, so each bucket would hold 51 possible values, enabling us to store values of up to 51*(1 << 13). So instead we could use just 7 bits to hold the keys, leaving 57 bits per bucket, which would thus enable us to store values up to about 54 thousand. Which bucket size we use will depend on the order in which elements were added, since we only reallocate when needed either because we have an element that we cannot fit, or because our hashmap is too full. When allocating, we tend to leave room for the maximum to increase.

Assuming we use 13 bits per bucket, then let’s talk through the process of inserting the value 137. Since we have 51 elements per bucket, the key is found by dividing by 51, which gives us a key of 137/51 = 2. We then look up the bucket with key 2 using a pretty standard Robin Hood algorithm (except that the keys are a portion of a word). Once we have that bucket, we will identify that the bit corresponding to our value is bit number 137 % 51, which we will set (and also check the value of, to track the number of elements in the set) and determine the return value.

This format allows us to efficiently store sets in which there are contiguous chunks of elements, and when the elements are widely spaced it at least takes no more than 64 bits per 64-bit value, plus hash-set overhead. Its complexity also makes it significantly slower for insertion (or collect()) than a standard HashSet, but it also can take considerably less space.

Implementations

Returns true if the set is empty

Create an empty set with capacity to hold the provided set.

let a: tinyset::SetU64 = (1..300).collect();
let mut b = tinyset::SetU64::with_capacity_of(&a);

assert_eq!(a.capacity(), b.capacity());
assert_eq!(b.len(), 0);
for i in a.iter() {
  b.insert(i);
}
assert_eq!(a.capacity(), b.capacity());
assert_eq!(b.len(), a.len());

The number of elements in the set

The capacity of the set

Print debugging information about this set.

Tally up how much memory is in use.

Create a set with the given capacity

Create a set with the given capacity and bits

An empty set

Insert and return true if it was not present.

Remove

Contais

Iterate over

Clears the set, returning all elements in an iterator.

Trait Implementations

Returns the union of self and rhs as a new SetU64.

Examples
let a: tinyset::SetU64 = (1..4).collect();
let b: tinyset::SetU64 = (3..6).collect();

assert_eq!(&a | &b, (1..6).collect());
The resulting type after applying the | operator.

Returns the union of self and rhs as a new SetU64, consuming self.

Examples
let a: tinyset::SetU64 = (1..4).collect();
let b: tinyset::SetU64 = (3..6).collect();

assert_eq!(a | &b, (1..6).collect());
The resulting type after applying the | operator.
Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more
Returns the “default value” for a type. Read more
Executes the destructor for this type. Read more
Extends a collection with the contents of an iterator. Read more
🔬This is a nightly-only experimental API. (extend_one)
Extends a collection with exactly one element.
🔬This is a nightly-only experimental API. (extend_one)
Reserves capacity in a collection for the given number of additional elements. Read more
Creates a value from an iterator. Read more
The type of the elements being iterated over.
Which kind of iterator are we turning this into?
Creates an iterator from a value. Read more
This method tests for self and other values to be equal, and is used by ==. Read more
This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason. Read more

Returns the difference of self and rhs as a new SetU64.

Examples
let a: tinyset::SetU64 = (1..4).collect();
let b: tinyset::SetU64 = (3..6).into_iter().collect();

assert_eq!(&a - &b, (1..3).collect());
The resulting type after applying the - operator.

Returns the difference of self and rhs as a new SetU64 consuming self. Should not allocate.

Examples
let a: tinyset::SetU64 = (1..4).collect();
let b: tinyset::SetU64 = (3..6).into_iter().collect();

assert_eq!(a - &b, (1..3).collect());
The resulting type after applying the - operator.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.