pub struct Fid { /* private fields */ }Expand description
FID (Fully Indexable Dictionary).
This class can handle bit sequence of virtually arbitrary length.
In fact, N (FID’s length) is designed to be limited to: N <= 2^64.
It should be enough for almost all usecases since a binary data of length of 2^64 consumes 2^21 = 2,097,152 TB (terabyte), which is hard to handle by state-of-the-art computer architecture.
§Implementation detail
Index<u64>’s implementation is trivial.
select() just uses binary search of rank() results.
rank()’s implementation is standard but non-trivial. So here explains implementation of rank().
§rank()’s implementation
Say you have the following bit vector.
00001000 01000001 00000100 11000000 00100000 00000101 10100000 00010000 001 ; (N=67)Answer rank(48) in O(1) time-complexity and o(N) space-complexity.
Naively, you can count the number of ‘1’ from left to right. You will find rank(48) == 10 but it took O(N) time-complexity.
To reduce time-complexity to O(1), you can use memonization technique.
Of course, you can memonize results of rank(i) for every i ([0, N-1]).
Bit vector; 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 [1] 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 ; (N=67)
Memo rank(i); 0 0 0 0 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 5 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 9 10 10 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 13From this memo, you can answer rank(48) == 10 in constant time, although space-complexity for this memo is O(N) > o(N).
To reduce space-complexity using memonization, we divide the bit vector into Chunk and Block.
Bit vector; 00001000 01000001 00000100 11000000 00100000 00000101 [1]0100000 00010000 001 ; (N=67)
Chunk; | 7 | 13 | ; (size = (log N)^2 = 36)
Block; |0 |1 |1 |2 |2 |3 |3 |4 |6 |6 |6 |7 |0 |0 |0 |2 |4 |4 |4 |5 |5 |5 |6| ; (size = (log N) / 2 = 3)- A Chunk has size of (log N)^2. Its value is rank(index of the last bit of the chunk).
- A Block has size of (log N) / 2. A chunk has many blocks. Block’s value is the number of ’1’s in [index of the first bit of the chunk the block belongs to, index of the last bit of the block] (note that the value is reset to 0 at the first bit of a chunk).
Now you want to answer rank(48). 48-th bit is in the 2nd chunk, and in the 5th block in the chunk.
So the rank(48) is at least:
7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk)
Then, focus on 3 bits in 5th block in the 2nd chunk; [1]01.
As you can see, only 1 ‘1’ is included up to 48-th bit (101 has 2 ’1’s but 2nd ‘1’ is 50-th bit, irrelevant to rank(48)).
Therefore, the rank(48) is calculated as:
7 (value of 1st chunk) + 2 (value of 4th block in the 2nd chunk) + 1 (’1’s in 5th block up to 48-th bit)
OK. That’s all… Wait!
rank() must be in O(1) time-complexity.
- 7 (value of 1st chunk): O(1) if you store chunk value in array structure.
- 2 (value of 4th block in the 2nd chunk): Same as above.
- 1 (’1’s in 5th block up to 48-th bit): O(length of block) = O(log N) !
Counting ’1’s in a block must also be O(1), while using o(N) space.
We use Table for this purpose.
| Block content | Number of ’1’s in block |
|---|---|
000 | 0 |
001 | 1 |
010 | 1 |
011 | 2 |
100 | 1 |
101 | 2 |
110 | 2 |
111 | 3 |
This table is constructed in build(). So we can find the number of ’1’s in block in O(1) time.
Note that this table has O(log N) = o(N) length.
In summary:
rank() = (value of left chunk) + (value of left block) + (value of table keyed by inner block bits).
Implementations§
Source§impl Fid
impl Fid
Sourcepub fn rank(&self, i: u64) -> u64
pub fn rank(&self, i: u64) -> u64
Returns the number of 1 in [0, i] elements of the Fid.
§Panics
When i >= length of the Fid.
§Implementation detail
00001000 01000001 00000100 11000000 00100000 00000101 00100000 00010000 001 Raw data (N=67)
^
i = 51
| 7 | 13 | Chunk (size = (log N)^2 = 36)
^
chunk_left i_chunk = 1 chunk_right
|0 |1 |1 |2 |2 |3 |3 |4 |6 |6 |6 |7 |0 |0 |0 |2 |3 |3 |4 |4 |4 |5 |5| Block (size = log N / 2 = 3)
^
i_block = 17
block_left | block_right- Find
i_chunk.i_chunk=i/chunk_size. - Get
chunk_left= Chunks[i_chunk- 1] only ifi_chunk> 0. - Get rank from chunk_left if
chunk_leftexists. - Get
chunk_right= Chunks[i_chunk]. - Find
i_block.i_block= (i-i_chunk*chunk_size) / block size. - Get
block_left=chunk_right.blocks[i_block- 1]_ only if _i_block` > 0. - Get rank from block_left if
block_leftexists. - Get inner-block data _
block_bits.block_bitsmust be of block size length, fulfilled with 0 in right bits. - Calculate rank of
block_bitsin O(1) using a table memonizing block size bit’s popcount.
Trait Implementations§
Source§impl From<&str> for Fid
impl From<&str> for Fid
Source§fn from(s: &str) -> Self
fn from(s: &str) -> Self
Constructor from string representation of bit sequence.
- ‘0’ is interpreted as 0.
- ‘1’ is interpreted as 1.
- ‘_’ is just ignored.
§Examples
use fid_rs::Fid;
let fid = Fid::from("01_11");
assert_eq!(fid[0], false);
assert_eq!(fid[1], true);
assert_eq!(fid[2], true);
assert_eq!(fid[3], true);§Panics
When:
scontains any character other than ‘0’, ‘1’, and ‘_’.sdoes not contain any ‘0’ or ‘1’
Auto Trait Implementations§
impl Freeze for Fid
impl RefUnwindSafe for Fid
impl Send for Fid
impl Sync for Fid
impl Unpin for Fid
impl UnwindSafe for Fid
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more