Crate enum_bitset

Source
Expand description

§Generate efficient bitsets out of your enum types.

Crates.io Version docs.rs

The generated bitset type is much like the standard HashSet or BTreeSet types in that they contain a list of (non-repeating) values of the base enum. But being implemented as a bitset, the memory usage is typically much lower than the standard types, and all set operations (other than iteration) are constant time.

A bitset is a data structure that basically stores a list of bits, where each bit represents the presence or absence of one of the possible values of the base enum type.

The main intended use-case of the EnumBitset macro is an enum that represents the possible states (in the general meaning of the word) of some entity. It is common that, in some parts of your program, you need to so specify a set of those states. For example,

  • Filters:

    • The base enum represents the possible states of an entity. You want to present a list of entities to the user, allowing them to filter by one or more of those states.
    • Event subscriptions. What event types does every listener care about?
  • Grouping / Hierarchy:

    • A large state machine is divided in phases. Every phase is a “sub-state-machine”. You might have an enum representing all the states of the state machine, State; and another enum Phase representing the phases. You can apply EnumBitset to the State enum to automatically create a StateSet enum. Then you can store the states that belong to every phase in a HashMap<Phase, StateSet>.
    • In a graph, the set of nodes that can be reached from a given Node can be represented by a NodeSet.
  • Non-exclusive values:

    • You have an enum Mode representing possible modes of operation of some part of some entity. But those modes are not mutually exclusive. You can store the modes that are active in the ModeSet type.
    • In a permission system, permissions might be represented by a Permission enum. Then, the set of permissions that a given user has can efficiently be represented by a PermissionSet.
    • What Resources are needed for a concrete task? You guess it: a ResourceSet!

§Basic usage

To use, add the enum-bitset crate as a dependency,

cargo add enum-bitset

or manually edit your Cargo.toml file

[dependencies]
enum-bitset = "0.1.2"

and add the derive EnumBitset macro on your enum

use enum_bitset::EnumBitset;

#[derive(EnumBitset, Copy, Clone)]
enum IpAddrKind {
    V4,
    V6,
}

let mut set = IpAddrKindSet::empty();
assert!(!set.contains(IpAddrKind::V4));
assert_eq!(set.len(), 0);

set.insert(IpAddrKind::V6);
assert!(!set.contains(IpAddrKind::V4));
assert!(set.contains(IpAddrKind::V6));

let set2 = IpAddrKind::V4 | IpAddrKind::V6;
assert!(set2.contains(IpAddrKind::V4));
assert!(set2.contains(IpAddrKind::V6));
assert!(!set2.is_empty());
assert!(set2.is_all());
assert_eq!(set2, IpAddrKindSet::all());

let set3 = set2 - IpAddrKind::V4;
assert!(set3.contains(IpAddrKind::V6));
assert!(!set3.contains(IpAddrKind::V4));
assert!(!set3.is_empty());
assert!(!set3.is_all());
assert_eq!(set3, IpAddrKind::V6.into());

You can check the generated type in the example section. For more examples, see the tests directory at GitHub.

§Configuration

The generated type macro can be configured using the #[bitset] attribute on the enum. As usual, the attribute must be placed after the derive attribute and before the enum declaration.

The argument of the bitset attribute is a comma-separated list of key-value pairs. The following keys are supported:

§name

By default, the name of the generated type is the name of the base enum with the Set suffix. You can customize the generated name using the name argument.

use enum_bitset::EnumBitset;

#[derive(EnumBitset, Debug, Clone, Copy)]
#[bitset(name = Excuses)]
enum WhatToSayToBoss {
    JustFiveMinutes,
    WaitingForInspiration,
    MyWifiIsSlow,
    ItsAlmostLunchtime,
    StartingMonday,
}

let say = Excuses::all();
assert_eq!(say.len(), 5);

§repr

The types generated by this crate store the bits are stored as a primitive unsigned type. By default, the macro will choose the smallest possible primitive unsigned integer that can hold the number of bits required to represent the number of variants in the base enum. For example, if the base enum has 10 variants, then the bitset will store a u16 integer by default.

use std::alloc::Layout;
use enum_bitset::EnumBitset;

#[derive(EnumBitset, Debug, Copy, Clone)]
 enum DeveloperSpecies {
     StackOverflowSage,
     TabsVsSpacesCrusader,
     LegacyCodeWhisperer,
     CoffeeCompiler,
     GitRebaser,
     RegexNinja,
     DocBlockPoet,
     DeadlineOptimist,
     BinaryWhisperer,
     TuxZealot,
 }

assert_eq!(Layout::new::<DeveloperSpeciesSet>(), Layout::new::<u16>());

assert_eq!(DeveloperSpeciesSet::from_repr(1u16).unwrap(),
           DeveloperSpeciesSet::from([DeveloperSpecies::StackOverflowSage]));

You can override the default choice by specifying the bitset attribute with the repr = *type* argument. Possible values are u8, u16, u32, u64, and u128. The provided underlying type must be as least as wide as the number of variants that the base enum has.

For example, given the following 3-variant enum, you can force the derived type to use a u32 instead of a u8.

use std::alloc::Layout;
use enum_bitset::EnumBitset;

#[derive(EnumBitset, Debug, Copy, Clone)]
#[bitset(repr = u32)]
enum CodeComment {
    TodoSinceFirstCommit,
    BlackMagic,
    LateNightRambling,
}

assert_ne!(Layout::new::<CodeCommentSet>(), Layout::new::<u8>());
assert_eq!(Layout::new::<CodeCommentSet>(), Layout::new::<u32>());

assert_eq!(CodeCommentSet::from_repr(2u32).unwrap(),
           CodeCommentSet::from([CodeComment::BlackMagic]));

§no_debug

If no_debug is specified, the generated type will not implement the Debug trait.

use enum_bitset::EnumBitset;

#[derive(EnumBitset, Clone)]
#[bitset(no_debug)]
enum ErrorMessage {
    UserDidSomethingWeird,
    HaveYouTriedLoggingIn,
    ItWorksOnMyMachine,
    WhoTouchedTheProductionDb,
    CoffeeSpilledOnServer,     
    InternBrokeTheDeployment,   
}

let set = ErrorMessageSet::from([ErrorMessage::HaveYouTriedLoggingIn,
                                 ErrorMessage::ItWorksOnMyMachine]);

let debug = format!("{:?}", set); // error! `ErrorMessageSet` doesn't implement `Debug`

By default, the generated type does implement Debug, regardless of whether the base enum does.

If the base enum does not implement Debug, and no_debug is not set, then the generated type will implement Debug by only showing the number of items in the set.


use enum_bitset_derive::EnumBitset;

#[derive(EnumBitset, Clone)]
enum ErrorMessage {
    UserDidSomethingWeird,
    HaveYouTriedLoggingIn,
    ItWorksOnMyMachine,
    WhoTouchedTheProductionDb,
    CoffeeSpilledOnServer,     
    InternBrokeTheDeployment,   
}

let set = ErrorMessageSet::from([ErrorMessage::HaveYouTriedLoggingIn, ErrorMessage::ItWorksOnMyMachine]);

let debug = format!("{:?}", set);
assert_eq!(debug, "ErrorMessageSet(2){/* 2 items */}");

§serde

If the serde feature is enabled, the generated type will be compatible with the ubiquitous serde crate. The generated type will implement the serde::Serialize and serde::Deserialize traits. The only requisite is that the base enum must implement serde::Serialize and serde::Deserialize. The set will (de)serialize as an array of the base enum variants.

If the serde feature is disabled, the macro will silently ignore the serde parameter, so you don’t need to use any cfg_attr.

use enum_bitset::EnumBitset;
use serde::{Serialize, Deserialize};

#[derive(EnumBitset, Serialize, Deserialize, Clone)]
enum ProgrammerOops {
    CoffeeInKeyboard,
    ForgotSemicolon,
    CompilerIsWrong,
    WorksOnMyMachine,
    FridayDeploy,
    InfiniteLoop,
    DenialOfSleep,
}

let set = ProgrammerOopsSet::from_slice(&[ProgrammerOops::CoffeeInKeyboard,
                                          ProgrammerOops::InfiniteLoop]);

let serialized = serde_json::to_string(&set).unwrap();
assert_eq!(serialized, "[\"CoffeeInKeyboard\",\"InfiniteLoop\"]");

let deserialized: ProgrammerOopsSet = serde_json::from_str(&serialized).unwrap();
assert_eq!(deserialized, set);

If in some cases the compatibility for a given enum with serde is not desirable, or if you only want to implement one of the Serialize or Deserialize traits, the macro can be configured with the serde argument of the bitset attribute.

Possible values are:

  • serde = true (default): the generated type will implement serde::Serialize and serde::Deserialize.
  • serde = false: the generated type will not implement serde::Serialize and serde::Deserialize.
  • serde = "ser" or serde = "serialize": the generated type will implement serde::Serialize but not serde::Deserialize.
  • serde = "de"or serde = "deserialize": the generated type will implement serde::Deserialize but not serde::Serialize.
  • serde = "both": equivalent to serde = true.
  • serde = "none": equivalent to serde = false.
use enum_bitset::EnumBitset;
use serde::{Serialize};

#[derive(Serialize, EnumBitset, Debug, Clone, Copy)]
#[bitset(serde = "ser")]
enum DogMoment {
    TailChasing,
    BellyRubRequest,
    TiltedHead,
    BarkAtNothing,
    DoorGuardian,      
    PuppyEyes,
}

let set : DogMomentSet = DogMoment::TiltedHead.into();
let serialized = serde_json::to_string(&set).unwrap();
assert_eq!(serialized, "[\"TiltedHead\"]");

// The following would would not compile! Deserialize not implemented:
// let deserialized: DogMomentSet = serde_json::from_str(&serialized).unwrap();
// assert_eq!(deserialized, set);

Note: If, for some reason, the serde crate is renamed, you can specify it using #[bitset(serde_crate = serde_crate_path)].

§no_base_ops

By default, the generated set type will implement the core Add and BitOr traits for the base enum. This allows you to create a set by just “adding” (or “or-ing”) values of the base enum.

use enum_bitset::EnumBitset;

#[derive(EnumBitset, Clone, Copy, Debug)]
enum IceCreamFail {
    FellOnShirt,
    MeltedBeforeFirstLick,
    SeagullStoleIt,
}

let set1 : IceCreamFailSet =  IceCreamFail::FellOnShirt + IceCreamFail::MeltedBeforeFirstLick;

assert_eq!(set1.len(), 2);
assert!(!set1.is_all());

let set2 = set1 + IceCreamFail::SeagullStoleIt;
assert_eq!(set2.len(), 3);
assert!(set2.is_all());

If you want to disable this behavior, you can specify the no_base_ops argument of the bitset attribute. This can be useful, for instance, if you want to implement those traits on the base enum with different semantics, probably unrelated to the bitset.

Note that the operator overloads will still be available on the generated type, but not on the base enum. That is, you can “add”/“or” elements to an existing set using the +/| operators (in addition to using mutating methods).

use enum_bitset::EnumBitset;

#[derive(EnumBitset, Clone, Copy, Debug)]
#[bitset(no_base_ops)]
enum PotatoStatus {
    BurntToACrisp,
    StillRawInside,
    PerfectlyGolden,
}

// Error! `+` operator not implemented for `PotatoStatus`
// let set1 : PotatoStatusSet =  PotatoStatus::BurntToACrisp + PotatoStatus::StillRawInside;

let set2 = PotatoStatusSet::from([PotatoStatus::BurntToACrisp]);

// But you can still use `+` and `|` on the set type:
let set3 = set2 + PotatoStatus::StillRawInside;
let set4 = set3 | PotatoStatus::PerfectlyGolden;

assert_eq!(set3.len(), 2);
assert!(!set3.is_all());

assert_eq!(set4.len(), 3);
assert!(set4.is_all());

§crate

In case that you rename the enum-bitset crate in your Cargo.toml, the code generated by the macro will not be able to call some utility code contained in the crate. To work around this, you can specify the crate argument.

[dependencies]
my-enum-bitset = { version= "0.1.2", package = "enum-bitset" }
use my_enum_bitset::EnumBitset;

#[derive(EnumBitset, Clone)]
#[bitset(crate = my_enum_bitset)]
enum ProgrammerMood {
    IAmAGenius,
    IAmAnIdiot,
}

let set = ProgrammerMoodSet::from_array([ProgrammerMood::IAmAnIdiot]);

assert!(!set.is_empty());

§Cargo feature

This crate has the following optional feature:

  • serde: Enables support for the serde crate. Check the serde section for more details.

§Technical details

§Guarantees

The bitset types generated by this crate have the following guarantees:

  • The layout and ABI of the bitset are guaranteed to be exactly the same as the underlying integer type used to store the data. That is, a bitset generated for an enum of 6 variants and backed by a u8 type will have the same layout and ABI as the primitive u8 type.

  • The bitset type is guaranteed to be Send and Sync, regardless of whether the base enum is.

  • The underling bit pattern for a concrete value will be stable across different compilations if the order of the variants remains the same. That is, you can efficiently persist or transmit the bitset as the underling integer type, provided that the order of the variants in the base enum is stable.

    More concretely, the first variant in the base enum will be represented by the least-significant bit position. The second variant will be represented by the second least significant digit, and so on.

  • The bitset type will have the same size as the underlying integer type, provided that the order of the variants in the base enum does not change.

  • If the underling integer type has more bits than those required to represent the number of variants in the base enum, then the extra bits will be zero for any bitset generated via the safe methods. If you manipulate directly the underlying value, via unsafe code you must maintain this invariant. Breaking this invariant is UB.

Any change on any of those guarantees may only be introduced in a new major version of the crate. However, most of these are guarantees are by design, so a radical departure is unlikely. They might evolve but not really change in a fundamental way.

§Invariant

The generated bitset types maintain the following invariant:

Any bits of the underlying integer type that are not mapped to a variant of the base enum, then those bits must always be zero.

This invariant is enforced by all the (safe) methods generated by the EnumBitset macro.

However, if you manipulate the underlying integer directly (via unsafe code, transmutation, or using the unsafe from_repr_unchecked), then it is your responsibility to ensure the invariant.

Breaking this invariant is considered Undefined Behavior and might result in unexpected results, including panic.

§Design choices

Besides the signatures of the methods, I’ve found that there are basically three important choices when designing a bitset type:

  1. How to represent the bitset under the hood.
  2. How to implement the core set operations (insert, removal, union, intersection, difference, symmetric difference, iteration, etc.).
  3. How to convert from the base enum values to the position of the bits in the underlying values.

In the first matter, this crate decides to represent the bitset as a single primitive unsigned integer. Choosing, by default, the smallest type possible; but the user might choose to use a larger type. Because of this, the EnumBitset macro only works with types up to 128 variants. This should not be a concert in most use cases.

For the implementation of core set operations, this crate decides to implement them using the bitwise operations. This particular choice should be pretty non-controversial and standard: bitwise operations make all methods O(1), and is usually the fastest implementation. Additionally, due to the invariant upheld by bitset generated types, masking the values is usually not needed in most methods, saving some cicles in most methods.

In the case of iteration, it is implemented by using the leading_zeros() method of the underlying integer type. This is a fast operation, which can be mapped to a CPU instruction in most instruction sets; hence every iteration call is usually O(1). Therefore, iterating a set is O(N) where N is the number of items in the set.

Finally, the conversion from the base enum values to the position of the respective bit in the underlying integer can be done in a number of ways:

  • By forcing a representation of the base enum and then coercing the value to that representation. For example, #[repr(u8)] and then let pos = enum_value as u8.

    This strategy is probably the most straightforward but forces the user to specify a representation for the base enum. If the user of the type needs to specify the discriminant of the variants, they are restricted to values in the 0..size_of::<UnderlyingType>() range. The bit pattern associated with the n-th can be computed using a shift, 1 << n.

    This method has some flexibility since it allows the user to specify which bit represents a given variant. But in general, it’s rather inflexible: it forces the user to tie the representation of the base enum to how the bitset is represented.

    Note that, if we limit ourselves to derive macros, which cannot modify the original enum, choosing the discriminat of every variant must be done by the user (or inferred using the default rust rules if not specified). A more powerful attribute macro could be used to add such specifications automatically, at the expense of being slightly less friendly to IDEs.

  • By forcing a representation type of the base enum and then constraining the value of every variant to be a power of two. That is, the first variant should be represented by 2^0 = 1 (or, equivalently, to a shift, 1 << 0), the second by 2^1 = 1 << 2 = 2, and so on.

    By using this method, every variant is represented by the bit pattern that corresponds to a set that only contains that variant. This makes the implementation of insert/removal and contains operations very fast, since we have direct access to the bit pattern of the value.

    This enum has the almost same flexibility considerations as the previous method. It also ties the representation of the base enum to the bitset, so the user is not able to choose the values for other purposes. However, it is more performant since the bit pattern is already available in the representation of the enum, so no need to perform a shift in every operation to compute it.

  • Using a match statement on the values of the base enum to retrieve the bitmask corresponding to the variant.

    This method might be slightly less performant than the previous two. It is common knowledge that match statements on enums where the variants have contiguous values can be optimized by the compiler to a jump table, at least when compiling for production. And it seems to be a fix in LLVM to produce even faster code. The result of the match is directly the bit pattern of the variant, so the match is the only operation that needs to be performed. The match can be less performant than the simple cast to integer needed in the previous strategy, still achieves good performance. This is especially true with optimizations.

    Remarkably, this method frees the user from having to tie the representation of the base enum to the bitset. This comes handy if using the bitset is not the primary use of the base enum.

This crate is designed to use the third method, a match statement. It is an opinionated choice based on achieving the most flexible use of the EnumBitset macro: not requiring any modification on the base enum.

This choice has a tradeoff with performance. Compiling for release, given the amazing optimizations that rustc and LLVM do, the difference (if any) should be negligible for most use cases. In fact, for small enums, even without optimizations enabled, the performance of EnumBitset should be enough for most use cases.

However, if you really need to squeeze every CPU cicle, I recommend looking to other crates to that implement this tradeoff differently. We do not have benchmarks, but it seems that a bit of performance might be gained by using the second method described above.

In the future, should the need arise, customization options might be added to allow specifying the position of the bits in the underlying integer or even to automatically set the representation of every variant. In that case, the Guarantees might need to evolve to take into account those config options. Most likely, the default will remain as the current implementation right now.

§no_std and FFI usage

This crate is unconditionally #![no_std], so it can be used in WASM and embedded environments.

The guarantees that types generated have should make it possible to use the bitset types in FFI contexts. Namely:

  • The bitset type has the same layout and ABI as the underlying integer type,
  • The meaning of every bit in the underlying integer value is stable (provided that the order of the variants in the base enum does not change).

Together, those two characteristics should make it possible to share the bitset types via FFI.

However, note that myself I’m not a no_std nor a wasm developer, so the crate has not been tested in those contexts. If you find any problems or can provide test cases to ensure that everything works, please open an issue.

§Dependencies

Without any feature active, the code generated by the macro does not depend on any other crate. It only uses types and functions of Rust’s core library.

If the serde feature is active, then enum-bitset depends on the serde crate. Assuming that any downstream crate or binary will have it as a dependency (otherwise just don’t enable the feature), this should not add more dependencies to your project.

The only dependencies used by this crate are used for compiling the proc macro, so they don’t affect the final artifact of downstream crates and binaries. These compile time dependencies are pretty standard in proc-macro crates: proc-macro2, quote, syn, and heck. In the case of syn, care has been taken to make sure that only the minimum set of features required are enabled.

§Limitations

  • The bitset type is currently limited to enums with up to 128 variants (since the largest primitive unsigned integer type is u128). This limitation may be lifted in the future.

  • EnumBitset only supports enums that are cloneable. This is rarely a limiting factor, since all enums without fields are cloneable (and, in fact, they can be Copy, if desired). But it is important to have it in mind if, for some reason, you don’t want to make your enum clone or copy.1

  • The macro can be only used in unit enums. That is, with enums where all the variants have no fields.

    You can work around this limitation by using the EnumDiscriminants macro from the excellent strum crate:

use strum::EnumDiscriminants;
use enum_bitset::EnumBitset;

#[derive(EnumDiscriminants, Debug, Clone)]
#[strum_discriminants(name(BugReportType), derive(EnumBitset))]
enum BugReport {
   WontReproduce {
       attempts: u32,
       works_on: String,
       last_seen: Option<String>,
   },
   UserError {
       what_they_did: String,
       what_they_should_have_done: String,
       facepalm_count: u8,
   },
   FeatureNotBug(bool),
   Cosmic {
       mercury_retrograde: bool,
       moon_phase: String,
   },
   RebootFixedIt,
}

let mut set = BugReportTypeSet::empty();
assert!(!set.contains(BugReportType::WontReproduce));
assert_eq!(set.len(), 0);

set.insert(BugReportType::WontReproduce);
set.insert(BugReportType::Cosmic);
assert!(set.contains(BugReportType::WontReproduce));
assert!(set.contains(BugReportType::Cosmic));
assert_eq!(set.len(), 2);

§Alternatives

This crate exists because at work I had recurrent use cases where I needed to represent a set of values out of an enum. While there are high-quality crates in the ecosystem that solve similar problems, after much experimentation I found that my set of trade-offs and usability preferences were a bit different from the crates in the ecosystem.

Of course, this doesn’t mean that existing crates are not good. Quite the opposite, being more mature and created by more experienced developers, they are probably much better than this crate. I am publishing this crate hoping that it might prove useful to someone else with similar preferences as me.

Here are some of the existing crates in the ecosystem.

§enumset

This is an incredibly good and feature complete crate, widely used. It supports enums with more than 128 variants.

My main concern with enumset is that the set type generated is generic over the base enum. That is, you use it as EnumSet<MyEnum>, where the EnumSet<T> type lives in the enumset crate. That means that you cannot add inherent methods to the generated type. This can be worked around by using an extension trait or a new-type wrapper, but it turned out to be cumbersome for my use cases.

If that limitation is not a problem for you, then I would recommend using enumset as a very mature and high-quality crate.

Note that enumset remains truthful to what a derive macro is originally meant to do: implement a trait on the type it is applied to2. In their case, they implement an EnumSetType trait, which is the requirement to use an enum as the generic parameter of their EnumSet type.

§bitfield

High-quality and well-known crate. However, its approach is very different: it allows generating structs that represent bitfields, creating getters and setters to manipulate ranges of bits in the underlying value. Therefore, the generated type does not have a relation with an enum.

§bitflags

Another high-quality and well-known crate. However, its philosophy is different: instead of generating a set type derived from an existing enum, it creates a structure with constants that represent a combination of flags. Its documentation explicitly states that it is not intended to be used as a bitfield.

§bitvec

Yet another high-quality and well-known crate. But once again, with a very different, lower-level, approach. It contains types that work muck like std’s collections of booleans, but using a one-bit-per-bool approach.


  1. In the future, if the need arises, this limitation might be softened. Cloning/copying is actually only required for iteration. So, if iteration is not needed, it should be possible to create the bitset type for an enum without cloning/copying. However, this should be a relatively uncommon requirement, so it hasn’t been implemented yet. 

  2. In contrast, our crate enum-bitset sort of abuses the derive macro to create a new type. This kind of (ab)use is not uncommon in the Rust ecosystem. Well, you could say it is a “derived type”… right? 

Modules§

example
Example to demonstrate the EnumBitset derive macro.

Derive Macros§

EnumBitset
Generates a bitset type out of an enum.