Expand description
§Generate efficient bitsets out of your enum types.
The generated bitset type is much like the standard HashSet or BTreeSet types in that they contain a list of (non-repeating) values of the base enum. But being implemented as a bitset, the memory usage is typically much lower than the standard types, and all set operations (other than iteration) are constant time.
A bitset is a data structure that basically stores a list of bits, where each bit represents the presence or absence of one of the possible values of the base enum type.
The main intended use-case of the EnumBitset
macro is an enum that represents the possible states
(in the general meaning of the word) of some entity. It is common that, in some parts of your program, you need to so specify a set of those states. For example,
-
Filters:
- The base enum represents the possible states of an entity. You want to present a list of entities to the user, allowing them to filter by one or more of those states.
- Event subscriptions. What event types does every listener care about?
-
Grouping / Hierarchy:
- A large state machine is divided in phases. Every phase is a “sub-state-machine”. You might have an enum representing all the states of the state machine,
State
; and another enumPhase
representing the phases. You can applyEnumBitset
to theState
enum to automatically create aStateSet
enum. Then you can store the states that belong to every phase in aHashMap<Phase, StateSet>
. - In a graph, the set of nodes that can be reached from a given
Node
can be represented by aNodeSet
.
- A large state machine is divided in phases. Every phase is a “sub-state-machine”. You might have an enum representing all the states of the state machine,
-
Non-exclusive values:
- You have an enum
Mode
representing possible modes of operation of some part of some entity. But those modes are not mutually exclusive. You can store the modes that are active in theModeSet
type. - In a permission system, permissions might be represented by a
Permission
enum. Then, the set of permissions that a given user has can efficiently be represented by aPermissionSet
. - What
Resources
are needed for a concrete task? You guess it: aResourceSet
!
- You have an enum
§Basic usage
To use, add the enum-bitset
crate as a dependency,
cargo add enum-bitset
or manually edit your Cargo.toml
file
[dependencies]
enum-bitset = "0.1.2"
and add the derive EnumBitset
macro on your enum
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Copy, Clone)]
enum IpAddrKind {
V4,
V6,
}
let mut set = IpAddrKindSet::empty();
assert!(!set.contains(IpAddrKind::V4));
assert_eq!(set.len(), 0);
set.insert(IpAddrKind::V6);
assert!(!set.contains(IpAddrKind::V4));
assert!(set.contains(IpAddrKind::V6));
let set2 = IpAddrKind::V4 | IpAddrKind::V6;
assert!(set2.contains(IpAddrKind::V4));
assert!(set2.contains(IpAddrKind::V6));
assert!(!set2.is_empty());
assert!(set2.is_all());
assert_eq!(set2, IpAddrKindSet::all());
let set3 = set2 - IpAddrKind::V4;
assert!(set3.contains(IpAddrKind::V6));
assert!(!set3.contains(IpAddrKind::V4));
assert!(!set3.is_empty());
assert!(!set3.is_all());
assert_eq!(set3, IpAddrKind::V6.into());
You can check the generated type in the example section. For more examples, see the tests
directory at GitHub.
§Configuration
The generated type macro can be configured using the #[bitset]
attribute on the enum. As usual, the attribute must be placed after the derive
attribute and before the enum declaration.
The argument of the bitset
attribute is a comma-separated list of key-value pairs. The following keys are supported:
§name
By default, the name of the generated type is the name of the base enum with the Set
suffix. You can customize the generated name using the name
argument.
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Debug, Clone, Copy)]
#[bitset(name = Excuses)]
enum WhatToSayToBoss {
JustFiveMinutes,
WaitingForInspiration,
MyWifiIsSlow,
ItsAlmostLunchtime,
StartingMonday,
}
let say = Excuses::all();
assert_eq!(say.len(), 5);
§repr
The types generated by this crate store the bits are stored as a primitive unsigned type. By default, the macro will choose the smallest possible primitive unsigned integer that can hold the number of bits required to represent the number of variants in the base enum. For example, if the base enum has 10 variants, then the bitset will store a u16
integer by default.
use std::alloc::Layout;
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Debug, Copy, Clone)]
enum DeveloperSpecies {
StackOverflowSage,
TabsVsSpacesCrusader,
LegacyCodeWhisperer,
CoffeeCompiler,
GitRebaser,
RegexNinja,
DocBlockPoet,
DeadlineOptimist,
BinaryWhisperer,
TuxZealot,
}
assert_eq!(Layout::new::<DeveloperSpeciesSet>(), Layout::new::<u16>());
assert_eq!(DeveloperSpeciesSet::from_repr(1u16).unwrap(),
DeveloperSpeciesSet::from([DeveloperSpecies::StackOverflowSage]));
You can override the default choice by specifying the bitset
attribute with the repr = *type*
argument. Possible values are u8
, u16
, u32
, u64
, and u128
. The provided underlying type must be as least as wide as the number of variants that the base enum has.
For example, given the following 3-variant enum, you can force the derived type to use a u32
instead of a u8
.
use std::alloc::Layout;
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Debug, Copy, Clone)]
#[bitset(repr = u32)]
enum CodeComment {
TodoSinceFirstCommit,
BlackMagic,
LateNightRambling,
}
assert_ne!(Layout::new::<CodeCommentSet>(), Layout::new::<u8>());
assert_eq!(Layout::new::<CodeCommentSet>(), Layout::new::<u32>());
assert_eq!(CodeCommentSet::from_repr(2u32).unwrap(),
CodeCommentSet::from([CodeComment::BlackMagic]));
§no_debug
If no_debug
is specified, the generated type will not implement the Debug
trait.
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Clone)]
#[bitset(no_debug)]
enum ErrorMessage {
UserDidSomethingWeird,
HaveYouTriedLoggingIn,
ItWorksOnMyMachine,
WhoTouchedTheProductionDb,
CoffeeSpilledOnServer,
InternBrokeTheDeployment,
}
let set = ErrorMessageSet::from([ErrorMessage::HaveYouTriedLoggingIn,
ErrorMessage::ItWorksOnMyMachine]);
let debug = format!("{:?}", set); // error! `ErrorMessageSet` doesn't implement `Debug`
By default, the generated type does implement Debug
, regardless of whether the base enum does.
If the base enum does not implement Debug
, and no_debug
is not set, then the generated type will implement Debug
by only showing the number of items in the set.
use enum_bitset_derive::EnumBitset;
#[derive(EnumBitset, Clone)]
enum ErrorMessage {
UserDidSomethingWeird,
HaveYouTriedLoggingIn,
ItWorksOnMyMachine,
WhoTouchedTheProductionDb,
CoffeeSpilledOnServer,
InternBrokeTheDeployment,
}
let set = ErrorMessageSet::from([ErrorMessage::HaveYouTriedLoggingIn, ErrorMessage::ItWorksOnMyMachine]);
let debug = format!("{:?}", set);
assert_eq!(debug, "ErrorMessageSet(2){/* 2 items */}");
§serde
If the serde
feature is enabled, the generated type will be compatible with the ubiquitous serde
crate. The generated type will implement the serde::Serialize
and serde::Deserialize
traits. The only requisite is that the base enum must implement serde::Serialize
and serde::Deserialize
. The set will (de)serialize as an array of the base enum variants.
If the serde
feature is disabled, the macro will silently ignore the serde
parameter, so you don’t need to use any cfg_attr
.
use enum_bitset::EnumBitset;
use serde::{Serialize, Deserialize};
#[derive(EnumBitset, Serialize, Deserialize, Clone)]
enum ProgrammerOops {
CoffeeInKeyboard,
ForgotSemicolon,
CompilerIsWrong,
WorksOnMyMachine,
FridayDeploy,
InfiniteLoop,
DenialOfSleep,
}
let set = ProgrammerOopsSet::from_slice(&[ProgrammerOops::CoffeeInKeyboard,
ProgrammerOops::InfiniteLoop]);
let serialized = serde_json::to_string(&set).unwrap();
assert_eq!(serialized, "[\"CoffeeInKeyboard\",\"InfiniteLoop\"]");
let deserialized: ProgrammerOopsSet = serde_json::from_str(&serialized).unwrap();
assert_eq!(deserialized, set);
If in some cases the compatibility for a given enum with serde
is not desirable, or if you only want to implement one of the Serialize
or Deserialize
traits, the macro can be configured with the serde
argument of the bitset
attribute.
Possible values are:
serde = true
(default): the generated type will implementserde::Serialize
andserde::Deserialize
.serde = false
: the generated type will not implementserde::Serialize
andserde::Deserialize
.serde = "ser"
orserde = "serialize"
: the generated type will implementserde::Serialize
but notserde::Deserialize
.serde = "de"
orserde = "deserialize"
: the generated type will implementserde::Deserialize
but notserde::Serialize
.serde = "both"
: equivalent toserde = true
.serde = "none"
: equivalent toserde = false
.
use enum_bitset::EnumBitset;
use serde::{Serialize};
#[derive(Serialize, EnumBitset, Debug, Clone, Copy)]
#[bitset(serde = "ser")]
enum DogMoment {
TailChasing,
BellyRubRequest,
TiltedHead,
BarkAtNothing,
DoorGuardian,
PuppyEyes,
}
let set : DogMomentSet = DogMoment::TiltedHead.into();
let serialized = serde_json::to_string(&set).unwrap();
assert_eq!(serialized, "[\"TiltedHead\"]");
// The following would would not compile! Deserialize not implemented:
// let deserialized: DogMomentSet = serde_json::from_str(&serialized).unwrap();
// assert_eq!(deserialized, set);
Note: If, for some reason, the serde
crate is renamed, you can specify it using #[bitset(serde_crate = serde_crate_path)]
.
§no_base_ops
By default, the generated set type will implement the core Add
and BitOr
traits for the base enum. This allows you to create a set by just “adding” (or “or-ing”) values of the base enum.
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Clone, Copy, Debug)]
enum IceCreamFail {
FellOnShirt,
MeltedBeforeFirstLick,
SeagullStoleIt,
}
let set1 : IceCreamFailSet = IceCreamFail::FellOnShirt + IceCreamFail::MeltedBeforeFirstLick;
assert_eq!(set1.len(), 2);
assert!(!set1.is_all());
let set2 = set1 + IceCreamFail::SeagullStoleIt;
assert_eq!(set2.len(), 3);
assert!(set2.is_all());
If you want to disable this behavior, you can specify the no_base_ops
argument of the bitset
attribute. This can be useful, for instance, if you want to implement those traits on the base enum with different semantics, probably unrelated to the bitset.
Note that the operator overloads will still be available on the generated type, but not on the base enum. That is, you can “add”/“or” elements to an existing set using the +
/|
operators (in addition to using mutating methods).
use enum_bitset::EnumBitset;
#[derive(EnumBitset, Clone, Copy, Debug)]
#[bitset(no_base_ops)]
enum PotatoStatus {
BurntToACrisp,
StillRawInside,
PerfectlyGolden,
}
// Error! `+` operator not implemented for `PotatoStatus`
// let set1 : PotatoStatusSet = PotatoStatus::BurntToACrisp + PotatoStatus::StillRawInside;
let set2 = PotatoStatusSet::from([PotatoStatus::BurntToACrisp]);
// But you can still use `+` and `|` on the set type:
let set3 = set2 + PotatoStatus::StillRawInside;
let set4 = set3 | PotatoStatus::PerfectlyGolden;
assert_eq!(set3.len(), 2);
assert!(!set3.is_all());
assert_eq!(set4.len(), 3);
assert!(set4.is_all());
§crate
In case that you rename the enum-bitset
crate in your Cargo.toml
, the code generated by the macro will not be able to call some utility code contained in the crate. To work around this, you can specify the crate
argument.
[dependencies]
my-enum-bitset = { version= "0.1.2", package = "enum-bitset" }
use my_enum_bitset::EnumBitset;
#[derive(EnumBitset, Clone)]
#[bitset(crate = my_enum_bitset)]
enum ProgrammerMood {
IAmAGenius,
IAmAnIdiot,
}
let set = ProgrammerMoodSet::from_array([ProgrammerMood::IAmAnIdiot]);
assert!(!set.is_empty());
§Cargo feature
This crate has the following optional feature:
serde
: Enables support for theserde
crate. Check the serde section for more details.
§Technical details
§Guarantees
The bitset types generated by this crate have the following guarantees:
-
The layout and ABI of the bitset are guaranteed to be exactly the same as the underlying integer type used to store the data. That is, a bitset generated for an enum of 6 variants and backed by a
u8
type will have the same layout and ABI as the primitiveu8
type. -
The bitset type is guaranteed to be
Send
andSync
, regardless of whether the base enum is. -
The underling bit pattern for a concrete value will be stable across different compilations if the order of the variants remains the same. That is, you can efficiently persist or transmit the bitset as the underling integer type, provided that the order of the variants in the base enum is stable.
More concretely, the first variant in the base enum will be represented by the least-significant bit position. The second variant will be represented by the second least significant digit, and so on.
-
The bitset type will have the same size as the underlying integer type, provided that the order of the variants in the base enum does not change.
-
If the underling integer type has more bits than those required to represent the number of variants in the base enum, then the extra bits will be zero for any bitset generated via the safe methods. If you manipulate directly the underlying value, via unsafe code you must maintain this invariant. Breaking this invariant is UB.
Any change on any of those guarantees may only be introduced in a new major version of the crate. However, most of these are guarantees are by design, so a radical departure is unlikely. They might evolve but not really change in a fundamental way.
§Invariant
The generated bitset types maintain the following invariant:
Any bits of the underlying integer type that are not mapped to a variant of the base enum, then those bits must always be zero.
This invariant is enforced by all the (safe) methods generated by the EnumBitset
macro.
However, if you manipulate the underlying integer directly (via unsafe code, transmutation, or using the unsafe from_repr_unchecked
), then it is your responsibility to ensure the invariant.
Breaking this invariant is considered Undefined Behavior and might result in unexpected results, including panic.
§Design choices
Besides the signatures of the methods, I’ve found that there are basically three important choices when designing a bitset type:
- How to represent the bitset under the hood.
- How to implement the core set operations (insert, removal, union, intersection, difference, symmetric difference, iteration, etc.).
- How to convert from the base enum values to the position of the bits in the underlying values.
In the first matter, this crate decides to represent the bitset as a single primitive unsigned integer. Choosing, by default, the smallest type possible; but the user might choose to use a larger type. Because of this, the EnumBitset
macro only works with types up to 128 variants. This should not be a concert in most use cases.
For the implementation of core set operations, this crate decides to implement them using the bitwise operations. This particular choice should be pretty non-controversial and standard: bitwise operations make all methods O(1)
, and is usually the fastest implementation. Additionally, due to the invariant upheld by bitset generated types, masking the values is usually not needed in most methods, saving some cicles in most methods.
In the case of iteration, it is implemented by using the leading_zeros()
method of the underlying integer type. This is a fast operation, which can be mapped to a CPU instruction in most instruction sets; hence every iteration call is usually O(1)
. Therefore, iterating a set is O(N)
where N
is the number of items in the set.
Finally, the conversion from the base enum values to the position of the respective bit in the underlying integer can be done in a number of ways:
-
By forcing a representation of the base enum and then coercing the value to that representation. For example,
#[repr(u8)]
and thenlet pos = enum_value as u8
.This strategy is probably the most straightforward but forces the user to specify a representation for the base enum. If the user of the type needs to specify the discriminant of the variants, they are restricted to values in the
0..size_of::<UnderlyingType>()
range. The bit pattern associated with the n-th can be computed using a shift,1 << n
.This method has some flexibility since it allows the user to specify which bit represents a given variant. But in general, it’s rather inflexible: it forces the user to tie the representation of the base enum to how the bitset is represented.
Note that, if we limit ourselves to derive macros, which cannot modify the original enum, choosing the discriminat of every variant must be done by the user (or inferred using the default rust rules if not specified). A more powerful attribute macro could be used to add such specifications automatically, at the expense of being slightly less friendly to IDEs.
-
By forcing a representation type of the base enum and then constraining the value of every variant to be a power of two. That is, the first variant should be represented by
2^0 = 1
(or, equivalently, to a shift,1 << 0
), the second by2^1 = 1 << 2 = 2
, and so on.By using this method, every variant is represented by the bit pattern that corresponds to a set that only contains that variant. This makes the implementation of insert/removal and contains operations very fast, since we have direct access to the bit pattern of the value.
This enum has the almost same flexibility considerations as the previous method. It also ties the representation of the base enum to the bitset, so the user is not able to choose the values for other purposes. However, it is more performant since the bit pattern is already available in the representation of the enum, so no need to perform a shift in every operation to compute it.
-
Using a
match
statement on the values of the base enum to retrieve the bitmask corresponding to the variant.This method might be slightly less performant than the previous two. It is common knowledge that
match
statements on enums where the variants have contiguous values can be optimized by the compiler to a jump table, at least when compiling for production. And it seems to be a fix in LLVM to produce even faster code. The result of the match is directly the bit pattern of the variant, so thematch
is the only operation that needs to be performed. Thematch
can be less performant than the simple cast to integer needed in the previous strategy, still achieves good performance. This is especially true with optimizations.Remarkably, this method frees the user from having to tie the representation of the base enum to the bitset. This comes handy if using the bitset is not the primary use of the base enum.
This crate is designed to use the third method, a match
statement. It is an opinionated choice based on achieving the most flexible use of the EnumBitset
macro: not requiring any modification on the base enum.
This choice has a tradeoff with performance. Compiling for release, given the amazing optimizations that rustc and LLVM do, the difference (if any) should be negligible for most use cases. In fact, for small enums, even without optimizations enabled, the performance of EnumBitset
should be enough for most use cases.
However, if you really need to squeeze every CPU cicle, I recommend looking to other crates to that implement this tradeoff differently. We do not have benchmarks, but it seems that a bit of performance might be gained by using the second method described above.
In the future, should the need arise, customization options might be added to allow specifying the position of the bits in the underlying integer or even to automatically set the representation of every variant. In that case, the Guarantees might need to evolve to take into account those config options. Most likely, the default will remain as the current implementation right now.
§no_std
and FFI usage
This crate is unconditionally #![no_std]
, so it can be used in WASM and embedded environments.
The guarantees that types generated have should make it possible to use the bitset types in FFI contexts. Namely:
- The bitset type has the same layout and ABI as the underlying integer type,
- The meaning of every bit in the underlying integer value is stable (provided that the order of the variants in the base enum does not change).
Together, those two characteristics should make it possible to share the bitset types via FFI.
However, note that myself I’m not a no_std
nor a wasm
developer, so the crate has not been tested in those contexts. If you find any problems or can provide test cases to ensure that everything works, please open an issue.
§Dependencies
Without any feature active, the code generated by the macro does not depend on any other crate. It only uses types and functions of Rust’s core library.
If the serde
feature is active, then enum-bitset
depends on the serde
crate. Assuming that any downstream crate or binary will have it as a dependency (otherwise just don’t enable the feature), this should not add more dependencies to your project.
The only dependencies used by this crate are used for compiling the proc macro, so they don’t affect the final artifact of downstream crates and binaries. These compile time dependencies are pretty standard in proc-macro crates: proc-macro2
, quote
, syn
, and heck
. In the case of syn
, care has been taken to make sure that only the minimum set of features required are enabled.
§Limitations
-
The bitset type is currently limited to enums with up to 128 variants (since the largest primitive unsigned integer type is
u128
). This limitation may be lifted in the future. -
EnumBitset
only supports enums that are cloneable. This is rarely a limiting factor, since all enums without fields are cloneable (and, in fact, they can beCopy
, if desired). But it is important to have it in mind if, for some reason, you don’t want to make your enum clone or copy.1
-
The macro can be only used in unit enums. That is, with enums where all the variants have no fields.
You can work around this limitation by using the EnumDiscriminants macro from the excellent
strum
crate:
use strum::EnumDiscriminants;
use enum_bitset::EnumBitset;
#[derive(EnumDiscriminants, Debug, Clone)]
#[strum_discriminants(name(BugReportType), derive(EnumBitset))]
enum BugReport {
WontReproduce {
attempts: u32,
works_on: String,
last_seen: Option<String>,
},
UserError {
what_they_did: String,
what_they_should_have_done: String,
facepalm_count: u8,
},
FeatureNotBug(bool),
Cosmic {
mercury_retrograde: bool,
moon_phase: String,
},
RebootFixedIt,
}
let mut set = BugReportTypeSet::empty();
assert!(!set.contains(BugReportType::WontReproduce));
assert_eq!(set.len(), 0);
set.insert(BugReportType::WontReproduce);
set.insert(BugReportType::Cosmic);
assert!(set.contains(BugReportType::WontReproduce));
assert!(set.contains(BugReportType::Cosmic));
assert_eq!(set.len(), 2);
§Alternatives
This crate exists because at work I had recurrent use cases where I needed to represent a set of values out of an enum. While there are high-quality crates in the ecosystem that solve similar problems, after much experimentation I found that my set of trade-offs and usability preferences were a bit different from the crates in the ecosystem.
Of course, this doesn’t mean that existing crates are not good. Quite the opposite, being more mature and created by more experienced developers, they are probably much better than this crate. I am publishing this crate hoping that it might prove useful to someone else with similar preferences as me.
Here are some of the existing crates in the ecosystem.
§enumset
This is an incredibly good and feature complete crate, widely used. It supports enums with more than 128 variants.
My main concern with enumset is that the set type generated is generic over the base enum. That is, you use it as EnumSet<MyEnum>
, where the EnumSet<T>
type lives in the enumset
crate. That means that you cannot add inherent methods to the generated type. This can be worked around by using an extension trait or a new-type wrapper, but it turned out to be cumbersome for my use cases.
If that limitation is not a problem for you, then I would recommend using enumset as a very mature and high-quality crate.
Note that enumset remains truthful to what a derive macro is originally meant to do: implement a trait on the type it is applied to2. In their case, they implement an EnumSetType
trait, which is the requirement to use an enum as the generic parameter of their EnumSet
type.
§bitfield
High-quality and well-known crate. However, its approach is very different: it allows generating structs that represent bitfields, creating getters and setters to manipulate ranges of bits in the underlying value. Therefore, the generated type does not have a relation with an enum.
§bitflags
Another high-quality and well-known crate. However, its philosophy is different: instead of generating a set type derived from an existing enum, it creates a structure with constants that represent a combination of flags. Its documentation explicitly states that it is not intended to be used as a bitfield.
§bitvec
Yet another high-quality and well-known crate. But once again, with a very different, lower-level, approach. It contains types that work muck like std’s collections of booleans, but using a one-bit-per-bool
approach.
In the future, if the need arises, this limitation might be softened. Cloning/copying is actually only required for iteration. So, if iteration is not needed, it should be possible to create the bitset type for an enum without cloning/copying. However, this should be a relatively uncommon requirement, so it hasn’t been implemented yet. ↩
In contrast, our crate
enum-bitset
sort of abuses the derive macro to create a new type. This kind of (ab)use is not uncommon in the Rust ecosystem. Well, you could say it is a “derived type”… right? ↩
Modules§
- example
- Example to demonstrate the
EnumBitset
derive macro.
Derive Macros§
- Enum
Bitset - Generates a bitset type out of an enum.