Expand description
§Compressed IntVec Module
This module delivers an efficient compressed vector for storing sequences of unsigned 64-bit integers. By leveraging bit-level encoding, it minimizes storage space while supporting fast random access.
§Overview
The principal data structure, IntVec
, encapsulates a compressed bitstream along with sampling offsets.
These offsets enable fast, random access to individual elements without decompressing the entire stream.
The module offers two variants based on endianness:
Both variants operate with codecs that implement the Codec
trait, allowing for flexible and configurable
encoding/decoding strategies. Some codecs even accept extra runtime parameters to fine-tune the compression.
Note: The
IntVec
structure is generic and you are not supposed to interact with it directly. Use the two endianness-specific types instead.
§Key Features
- Compact Storage: Compresses sequences of integers into a streamlined bitstream.
- Fast Random Access: Employs periodic sampling (every k-th element) to quickly locate individual elements.
- Flexible Codec Integration: Compatible with any codec conforming to the
Codec
trait. - Endianness Options: Provides both big-endian and little-endian formats to suit various interoperability needs.
§Components
IntVec
: The core structure holding compressed data, sampling offsets, codec parameters, and metadata. Direct interaction with this structure is not permitted, and you should use the endianness-specific types instead.BEIntVec
/LEIntVec
: Type aliases for the big-endian and little-endian implementations ofIntVec
.- Iterators:
BEIntVecIter
andLEIntVecIter
facilitate on-the-fly decoding as you iterate through the vector.
§Usage Examples
§Creating a Big-Endian Compressed Vector
use compressed_intvec::intvec::BEIntVec;
use compressed_intvec::codecs::ExpGolombCodec;
// Define a vector of unsigned 64-bit integers.
let input = vec![1, 5, 3, 1991, 42];
// Create a Big-Endian compressed vector using ExpGolombCodec with an extra codec parameter (e.g., 3)
// and sample every 2 elements.
let intvec = BEIntVec::<ExpGolombCodec>::from_with_param(&input, 2, 3).unwrap();
// Retrieve a specific element by its index.
let value = intvec.get(3);
assert_eq!(value, 1991);
// Decompress the entire structure back into a standard vector.
let decoded = intvec.into_vec();
assert_eq!(decoded, input);
§Creating a Little-Endian Compressed Vector
use compressed_intvec::intvec::LEIntVec;
use compressed_intvec::codecs::GammaCodec;
// Define a vector of unsigned 64-bit integers.
let input = vec![10, 20, 30, 40, 50];
// Create a Little-Endian compressed vector using GammaCodec without additional codec parameters,
// sampling every 2 elements.
let intvec = LEIntVec::<GammaCodec>::from(&input, 2).unwrap();
// Verify that random access works correctly.
assert_eq!(intvec.get(2), 30);
§Design Considerations
- Bitstream Representation: The compressed data is stored as a vector of 64-bit words (
Vec<u64>
). - Sampling Strategy: To ensure fast random access, sampling offsets are recorded for every k-th integer.
- Codec Abstraction: The module is codec-agnostic; any codec that implements the
Codec
trait may be employed. - Endianness Management: Endianness is seamlessly handled via phantom types, supporting both big-endian and little-endian variants without affecting performance.
Structs§
- BEInt
VecIter - Iterator over the values stored in a
BEIntVec
. The iterator decodes values on the fly. - IntVec
- A compressed vector of integers.
- LEInt
VecIter - Iterator over the values stored in a
LEIntVec
.