sbits/lib.rs
1//! # Succinct Data Structures
2//!
3//! *Near-optimal space with efficient queries.*
4//!
5//! ## Intuition First
6//!
7//! Imagine a library where every book is stored in a tightly packed vacuum-sealed bag.
8//! It takes up the absolute minimum space (the information-theoretic limit).
9//! Usually, to find a specific page, you'd have to unseal the whole bag (decompress).
10//!
11//! Succinct data structures allow you to "reach through" the vacuum seal and read
12//! specific pages, count words, or find the k-th occurrence of a character—all
13//! without unsealing the bag.
14//!
15//! ## The Problem
16//!
17//! Traditional data structures face a trade-off:
18//! - **Pointers**: Fast queries ($O(1)$) but massive overhead ($O(n \log n)$ bits).
19//! - **Compression**: Minimal space ($\log |C_n|$ bits) but no random access or queries ($O(n)$ to decompress).
20//!
21//! ## Historical Context
22//!
23//! ```text
24//! 1971 Fano Fano's associative memory: early partitioning of bits
25//! 1974 Elias Elias's static file storage: monotone sequences
26//! 1989 Jacobson Defined the succinct paradigm in his PhD thesis (rank/select)
27//! 1996 Munro-Raman Constant-time rank and select in o(n) extra space
28//! 2003 Grossi Wavelet trees for rank/select over arbitrary alphabets
29//! 2007 Ferragina FM-index: compressed full-text search (the "BWT" era)
30//! 2022 Kurpicz Modern engineering of rank9/select9 structures
31//! ```
32//!
33//! Guy Jacobson's key insight was that for many objects (like trees or bit vectors),
34//! we can build auxiliary structures that are much smaller than the data itself ($o(n)$ bits)
35//! but provide enough information to answer queries in constant time.
36//!
37//! ## Mathematical Formulation
38//!
39//! A data structure for a class of objects $C_n$ is:
40//! - **Implicit**: Uses $\log |C_n| + O(1)$ bits.
41//! - **Succinct**: Uses $\log |C_n| + o(\log |C_n|)$ bits.
42//! - **Compact**: Uses $O(\log |C_n|)$ bits.
43//!
44//! The fundamental operations are:
45//! - `rank(i)`: The number of set bits (1s) in the range $[0, i]$.
46//! - `select(k)`: The position of the $k$-th set bit.
47//!
48//! ## Complexity Analysis
49//!
50//! - **Time**: $O(1)$ for rank/select in most modern implementations.
51//! - **Space**: OPT + $o(n)$ bits (typically 5-20% overhead for auxiliary structures).
52//!
53//! ## What Could Go Wrong
54//!
55//! 1. **Static vs Dynamic**: Most succinct structures are static. Inserting into a
56//! succinct bit vector usually requires rebuilding the auxiliary structures.
57//! 2. **Cache Locality**: Constant time $O(1)$ doesn't mean "fast" if it requires
58//! multiple random memory lookups for large bitsets.
59//!
60//! ## Implementation Notes
61//!
62//! This crate provides:
63//! - **`BitVector`**: Core storage with rank/select support.
64//! - **`WaveletTree`**: Rank/select over larger alphabets (planned).
65//! - **`EliasFano`**: Monotone sequences with random access (planned).
66//!
67//! ## References
68//!
69//! - Jacobson, G. (1989). "Succinct Static Data Structures."
70//! - Munro, J. I., & Raman, V. (1996). "Selection and counting on the fly."
71//! - Grossi, R., et al. (2003). "High-order entropy-compressed text indexes."
72
73#![warn(missing_docs)]
74#![warn(clippy::all)]
75
76pub mod bitvec;
77pub mod elias_fano;
78pub mod error;
79pub mod implicit;
80pub mod rank_select;
81pub mod wavelet;
82
83pub use bitvec::BitVector;
84pub use elias_fano::EliasFano;
85pub use error::Error;
86pub use wavelet::WaveletTree;