Expand description
Fast and compact sets of bytes or ASCII characters,
useful for searching, parsing and determining membership of a given byte
in the given set.
They don’t use any allocation, nor even any std
features.
In fact, all of the provided functions are const
, so they can be freely
constructed at the compile time
§Sets
This crate exports two set types - ByteSet
for a set of general bytes
and AsciiSet
for a set restricted to the range of ASCII characters.
The is two times smaller in the memory, but comes with a slight
performance trade-off in having to check whether the given character
belongs to the ASCII range.
use bset::AsciiSet;
const OP: AsciiSet = AsciiSet::new().add_bytes(b"+-*/%&|^");
assert!(OP.contains(b'%'));
The sets are implemented as an array of pointer-sized bit masks.
§Stack of sets
Inspired by this article
by Maciej Hirsz, this crate provides a way to stack multiple sets into one
structure to gain even more performance.
To not slow it down, sets are “obtained” from the given stack at the type
level. For this, the module bits
contains types B0
, B1
, …, B7
representing indices of a set in the stack.
Because const fn
s currently don’t support generic functions, the sets
are indexed by the order they were added to the stack.
Type aliases can be used to identify the sets within the stack:
use bset::{bits::*, ByteSet, ByteStack};
const BYTE_STACK: ByteStack<B3> = ByteStack::new()
.add_set(ByteSet::DIGITS)
.add_set(ByteSet::ALPHABETIC)
.add_set(ByteSet::new().add_bytes(b"+-*/%&|^"));
type Digits = B0;
type Alphabetic = B1;
type Operations = B2;
assert!(BYTE_STACK.contains::<Operations>(b'%'));
Again, there are two versions, ByteStack
for all bytes and AsciiStack
restricted to the ASCII range. Benchmarks show that testing the set membership
is about 20% faster with stacked sets. They come with 8 times larger
memory size (128/256 bytes vs. 16/32), which does not increase with the stacks
added, so when 8 sets (the maximum number) are used in one stack,
the memory size is equivalent.
Modules§
- bits
- Types that denote the position of a byte set within a byte stack.
Structs§
- AnyByte
Set - A compact set of bytes.
Only particular instances -
AsciiSet
andByteSet
can be constructed. - AnyByte
Stack - A compact stack of up to 8 byte sets for fast lookup.
Only particular instances -
AsciiStack
andByteStack
can be constructed.
Constants§
- ASCII_
RANGE_ LEN - Range of ASCII characters.
- BITS_
PER_ CHUNK - Number of bytes in one chunk of the mask.
- CHUNKS
- Number of chunks in the ASCII set.
- CHUNK_
SIZE - Size of one chunk of the mask in the implementation of byte sets.
Type Aliases§
- Ascii
Set - A compact set of ASCII bytes. Spans only 16 bytes.
- Ascii
Stack - A compact stack of up to 8 ASCII sets for fast lookup.
- ByteSet
- A compact set of all bytes. Spans only 32 bytes.
- Byte
Stack - A compact stack of up to 8 full byte sets for fast lookup.