splinter-rs 0.11.0

A compressed bitmap format optimized for small, sparse sets of u32s with zero-copy querying.
Documentation
<h1 align="center">Splinter</h1>
<p align="center">
  <a href="https://docs.rs/splinter-rs"><img alt="docs.rs" src="https://img.shields.io/docsrs/splinter-rs"></a>
  &nbsp;
  <a href="https://github.com/orbitinghail/splinter-rs/actions"><img alt="Build Status" src="https://img.shields.io/github/actions/workflow/status/orbitinghail/splinter-rs/rust.yml"></a>
  &nbsp;
  <a href="https://crates.io/crates/splinter-rs"><img alt="crates.io" src="https://img.shields.io/crates/v/splinter-rs.svg"></a>
</p>

Splinter is a compressed bitmap format similar to [Roaring], optimized specifically for small, sparse sets of 32-bit unsigned integers (`u32`).

## Key Features:

- **Tree-based Encoding**: Splinter encodes `u32` values into a 256-way tree structure by decomposing integers into big-endian component bytes. Nodes throughout the tree (including the root) are optimized into four different storage classes: tree, vec, bitmap, run.

- **Zero-copy Access**: Designed for efficient querying without deserialization, the `SplinterRef` type allows direct, zero-copy reads from any type implementing `Deref<Target = [u8]>`.

[Roaring]: https://roaringbitmap.org/

## Comparison to Roaring

The following table tests Splinter and Roaring with/without LZ4 compression against many different data distributions. The size column represents number of bytes used when serialized. View the actual [test code] to understand the precise meaning of each distribution.

All tests optimize the bitmap before serialization. This ensures that Splinter and Roaring are able to maximize compression via the use of run-length encoding for cases where it is helpful.

Roaring tests use [`roaring-rs`].

[`roaring-rs`]: https://docs.rs/roaring/latest/roaring/

The `Baseline` "bitmap" is simply an array of `u32` integers.

The `ok` column compares each subsequent row to the `Splinter` row for that test. `-` and `+` signs refer to the relative size of the row compared to the Splinter per the `relative` column. For example a row containing `++++` is many times larger than the corresponding Splinter.

[test code]: ./src/splinter.rs#L604

```
test                           bitmap         size   expected   relative         ok
empty                          Splinter         13         13       1.00         ok
                               Roaring           8          8       0.62         --
                               Splinter LZ4     14         14       1.08         ok
                               Roaring LZ4       9          9       0.64          -
                               Baseline          0          0       0.00       ----
1 element                      Splinter         21         21       1.00         ok
                               Roaring          18         18       0.86          -
                               Splinter LZ4     23         23       1.10         ok
                               Roaring LZ4      20         20       0.87          -
                               Baseline          4          4       0.19       ----
1 dense block                  Splinter         25         25       1.00         ok
                               Roaring          15         15       0.60         --
                               Splinter LZ4     27         27       1.08         ok
                               Roaring LZ4      17         17       0.63          -
                               Baseline       1024       1024      40.96       ++++
1 half full block              Splinter         72         72       1.00         ok
                               Roaring         255        255       3.54        +++
                               Splinter LZ4     70         70       0.97         ok
                               Roaring LZ4     257        257       3.67        +++
                               Baseline        512        512       7.11       ++++
1 sparse block                 Splinter         57         57       1.00         ok
                               Roaring          48         48       0.84          -
                               Splinter LZ4     55         55       0.96         ok
                               Roaring LZ4      50         50       0.91          -
                               Baseline         64         64       1.12          +
8 half full blocks             Splinter        338        338       1.00         ok
                               Roaring        2003       2003       5.93       ++++
                               Splinter LZ4    339        339       1.00         ok
                               Roaring LZ4    2012       2012       5.94       ++++
                               Baseline       4096       4096      12.12       ++++
8 sparse blocks                Splinter         67         67       1.00         ok
                               Roaring          48         48       0.72          -
                               Splinter LZ4     68         68       1.01         ok
                               Roaring LZ4      50         50       0.74          -
                               Baseline         64         64       0.96         ok
64 half full blocks            Splinter       2634       2634       1.00         ok
                               Roaring       16452      16452       6.25       ++++
                               Splinter LZ4   2375       2375       0.90          -
                               Roaring LZ4   16503      16503       6.95       ++++
                               Baseline      32768      32768      12.44       ++++
64 sparse blocks               Splinter        450        450       1.00         ok
                               Roaring         392        392       0.87          -
                               Splinter LZ4    450        450       1.00         ok
                               Roaring LZ4     395        395       0.88          -
                               Baseline        512        512       1.14          +
256 half full blocks           Splinter      10074      10074       1.00         ok
                               Roaring       65580      65580       6.51       ++++
                               Splinter LZ4   9101       9101       0.90          -
                               Roaring LZ4   65835      65835       7.23       ++++
                               Baseline     131072     131072      13.01       ++++
256 sparse blocks              Splinter       1402       1402       1.00         ok
                               Roaring        1288       1288       0.92         ok
                               Splinter LZ4   1269       1269       0.91          -
                               Roaring LZ4    1294       1294       1.02         ok
                               Baseline       2048       2048       1.46          +
512 half full blocks           Splinter      20134      20134       1.00         ok
                               Roaring      130810     130810       6.50       ++++
                               Splinter LZ4  18137      18137       0.90          -
                               Roaring LZ4  131248     131248       7.24       ++++
                               Baseline     262144     262144      13.02       ++++
512 sparse blocks              Splinter       2790       2790       1.00         ok
                               Roaring        2568       2568       0.92         ok
                               Splinter LZ4   2470       2470       0.89          -
                               Roaring LZ4    2580       2580       1.04         ok
                               Baseline       4096       4096       1.47          +
fully dense                    Splinter        121        121       1.00         ok
                               Roaring          63         63       0.52         --
                               Splinter LZ4    123        123       1.02         ok
                               Roaring LZ4      65         65       0.53         --
                               Baseline      16384      16384     135.40       ++++
128/block; dense               Splinter       1250       1250       1.00         ok
                               Roaring        8208       8208       6.57       ++++
                               Splinter LZ4   1256       1256       1.00         ok
                               Roaring LZ4    8242       8242       6.56       ++++
                               Baseline      16384      16384      13.11       ++++
32/block; dense                Splinter       4802       4802       1.00         ok
                               Roaring        8208       8208       1.71         ++
                               Splinter LZ4   4564       4564       0.95         ok
                               Roaring LZ4    8242       8242       1.81         ++
                               Baseline      16384      16384       3.41        +++
16/block; dense                Splinter       5666       5666       1.00         ok
                               Roaring        8208       8208       1.45          +
                               Splinter LZ4   5666       5666       1.00         ok
                               Roaring LZ4    8242       8242       1.45          +
                               Baseline      16384      16384       2.89        +++
128/block; sparse mid          Splinter       1529       1529       1.00         ok
                               Roaring        8282       8282       5.42       ++++
                               Splinter LZ4   1494       1494       0.98         ok
                               Roaring LZ4    8311       8311       5.56       ++++
                               Baseline      16384      16384      10.72       ++++
128/block; sparse high         Splinter       1870       1870       1.00         ok
                               Roaring        8224       8224       4.40       ++++
                               Splinter LZ4   1679       1679       0.90          -
                               Roaring LZ4    8258       8258       4.92       ++++
                               Baseline      16384      16384       8.76       ++++
1/block; sparse mid            Splinter      10521      10521       1.00         ok
                               Roaring       10248      10248       0.97         ok
                               Splinter LZ4  10525      10525       1.00         ok
                               Roaring LZ4   10290      10290       0.98         ok
                               Baseline      16384      16384       1.56          +
1/block; sparse high           Splinter      15374      15374       1.00         ok
                               Roaring       40968      40968       2.66        +++
                               Splinter LZ4  15319      15319       1.00         ok
                               Roaring LZ4   41084      41084       2.68        +++
                               Baseline      16384      16384       1.07         ok
1/block; spread low            Splinter       8377       8377       1.00         ok
                               Roaring        8328       8328       0.99         ok
                               Splinter LZ4    687        687       0.08       ----
                               Roaring LZ4     689        689       1.00         ok
                               Baseline      16384      16384       1.96         ++
dense throughout               Splinter       4113       4113       1.00         ok
                               Roaring        2700       2700       0.66          -
                               Splinter LZ4   3643       3643       0.89          -
                               Roaring LZ4     608        608       0.17       ----
                               Baseline      16384      16384       3.98        +++
dense low                      Splinter        529        529       1.00         ok
                               Roaring         267        267       0.50         --
                               Splinter LZ4    529        529       1.00         ok
                               Roaring LZ4     269        269       0.51         --
                               Baseline      16384      16384      30.97       ++++
dense mid/low                  Splinter       4113       4113       1.00         ok
                               Roaring        2376       2376       0.58         --
                               Splinter LZ4   4077       4077       0.99         ok
                               Roaring LZ4     348        348       0.09       ----
                               Baseline      16384      16384       3.98        +++
random/32                      Splinter        145        145       1.00         ok
                               Roaring         328        328       2.26         ++
                               Splinter LZ4    147        147       1.01         ok
                               Roaring LZ4     331        331       2.25         ++
                               Baseline        128        128       0.88          -
random/256                     Splinter       1041       1041       1.00         ok
                               Roaring        2544       2544       2.44         ++
                               Splinter LZ4   1047       1047       1.01         ok
                               Roaring LZ4    2553       2553       2.44         ++
                               Baseline       1024       1024       0.98         ok
random/1024                    Splinter       4113       4113       1.00         ok
                               Roaring       10168      10168       2.47         ++
                               Splinter LZ4   4131       4131       1.00         ok
                               Roaring LZ4   10208      10208       2.47         ++
                               Baseline       4096       4096       1.00         ok
random/4096                    Splinter      15374      15374       1.00         ok
                               Roaring       40056      40056       2.61        +++
                               Splinter LZ4  15380      15380       1.00         ok
                               Roaring LZ4   40208      40208       2.61        +++
                               Baseline      16384      16384       1.07         ok
random/16384                   Splinter      52238      52238       1.00         ok
                               Roaring      148656     148656       2.85        +++
                               Splinter LZ4  52444      52444       1.00         ok
                               Roaring LZ4  149229     149229       2.85        +++
                               Baseline      65536      65536       1.25          +
random/65536                   Splinter     199694     199694       1.00         ok
                               Roaring      461288     461288       2.31         ++
                               Splinter LZ4 200479     200479       1.00         ok
                               Roaring LZ4  463095     463095       2.31         ++
                               Baseline     262144     262144       1.31          +
random/32/65536                Splinter         99         99       1.00         ok
                               Roaring          80         80       0.81          -
                               Splinter LZ4     96         96       0.97         ok
                               Roaring LZ4      81         81       0.84          -
                               Baseline        128        128       1.29          +
random/256/65536               Splinter        547        547       1.00         ok
                               Roaring         528        528       0.97         ok
                               Splinter LZ4    551        551       1.01         ok
                               Roaring LZ4     530        530       0.96         ok
                               Baseline       1024       1024       1.87         ++
random/1024/65536              Splinter       2083       2083       1.00         ok
                               Roaring        2064       2064       0.99         ok
                               Splinter LZ4   2093       2093       1.00         ok
                               Roaring LZ4    2072       2072       0.99         ok
                               Baseline       4096       4096       1.97         ++
random/4096/65536              Splinter       5666       5666       1.00         ok
                               Roaring        8208       8208       1.45          +
                               Splinter LZ4   5690       5690       1.00         ok
                               Roaring LZ4    8241       8241       1.45          +
                               Baseline      16384      16384       2.89        +++
random/65536/65536             Splinter         25         25       1.00         ok
                               Roaring          15         15       0.60         --
                               Splinter LZ4     23         23       0.92         ok
                               Roaring LZ4      17         17       0.74          -
                               Baseline     262144     262144   10485.76       ++++
random/8/1024                  Splinter         49         49       1.00         ok
                               Roaring          32         32       0.65          -
                               Splinter LZ4     51         51       1.04         ok
                               Roaring LZ4      33         33       0.65          -
                               Baseline         32         32       0.65          -
random/16/1024                 Splinter         67         67       1.00         ok
                               Roaring          48         48       0.72          -
                               Splinter LZ4     63         63       0.94         ok
                               Roaring LZ4      49         49       0.78          -
                               Baseline         64         64       0.96         ok
random/32/1024                 Splinter         94         94       1.00         ok
                               Roaring          80         80       0.85          -
                               Splinter LZ4     89         89       0.95         ok
                               Roaring LZ4      81         81       0.91         ok
                               Baseline        128        128       1.36          +
random/64/1024                 Splinter        126        126       1.00         ok
                               Roaring         144        144       1.14          +
                               Splinter LZ4    124        124       0.98         ok
                               Roaring LZ4     145        145       1.17          +
                               Baseline        256        256       2.03         ++
random/128/1024                Splinter        183        183       1.00         ok
                               Roaring         272        272       1.49          +
                               Splinter LZ4    180        180       0.98         ok
                               Roaring LZ4     273        273       1.52          +
                               Baseline        512        512       2.80        +++
average compression ratio (splinter_lz4 / splinter): 0.96
```

## Adaptations

Splinter is available in Python using the [splynters](https://pypi.org/project/splynters/) package on PyPI, and as an open [repository](https://github.com/nrposner/splynters) of the same name.

## License

Licensed under either of

- Apache License, Version 2.0 ([LICENSE-APACHE] or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT] or https://opensource.org/licenses/MIT)

at your option.

[LICENSE-APACHE]: ./LICENSE-APACHE
[LICENSE-MIT]: ./LICENSE-MIT

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you shall be dual licensed as above, without any
additional terms or conditions.