Module intvec

Source
Expand description

§Compressed IntVec Module

This module delivers an efficient compressed vector for storing sequences of unsigned 64-bit integers. By leveraging bit-level encoding, it minimizes storage space while supporting fast random access.

§Overview

The principal data structure, IntVec, encapsulates a compressed bitstream along with sampling offsets. These offsets enable fast, random access to individual elements without decompressing the entire stream. The module offers two variants based on endianness:

Both variants operate with codecs that implement the Codec trait, allowing for flexible and configurable encoding/decoding strategies. Some codecs even accept extra runtime parameters to fine-tune the compression.

Note: The IntVec structure is generic and you are not supposed to interact with it directly. Use the two endianness-specific types instead.

§Key Features

  • Compact Storage: Compresses sequences of integers into a streamlined bitstream.
  • Fast Random Access: Employs periodic sampling (every k-th element) to quickly locate individual elements.
  • Flexible Codec Integration: Compatible with any codec conforming to the Codec trait.
  • Endianness Options: Provides both big-endian and little-endian formats to suit various interoperability needs.

§Components

  • IntVec: The core structure holding compressed data, sampling offsets, codec parameters, and metadata. Direct interaction with this structure is not permitted, and you should use the endianness-specific types instead.
  • BEIntVec / LEIntVec: Type aliases for the big-endian and little-endian implementations of IntVec.
  • Iterators: BEIntVecIter and LEIntVecIter facilitate on-the-fly decoding as you iterate through the vector.

§Usage Examples

§Creating a Big-Endian Compressed Vector

use compressed_intvec::intvec::BEIntVec;
use compressed_intvec::codecs::ExpGolombCodec;

// Define a vector of unsigned 64-bit integers.
let input = vec![1, 5, 3, 1991, 42];

// Create a Big-Endian compressed vector using ExpGolombCodec with an extra codec parameter (e.g., 3)
// and sample every 2 elements.
let intvec = BEIntVec::<ExpGolombCodec>::from_with_param(&input, 2, 3).unwrap();

// Retrieve a specific element by its index.
let value = intvec.get(3);
assert_eq!(value, 1991);

// Decompress the entire structure back into a standard vector.
let decoded = intvec.into_vec();
assert_eq!(decoded, input);

§Creating a Little-Endian Compressed Vector

use compressed_intvec::intvec::LEIntVec;
use compressed_intvec::codecs::GammaCodec;

// Define a vector of unsigned 64-bit integers.
let input = vec![10, 20, 30, 40, 50];

// Create a Little-Endian compressed vector using GammaCodec without additional codec parameters,
// sampling every 2 elements.
let intvec = LEIntVec::<GammaCodec>::from(&input, 2).unwrap();

// Verify that random access works correctly.
assert_eq!(intvec.get(2), 30);

§Design Considerations

  • Bitstream Representation: The compressed data is stored as a vector of 64-bit words (Vec<u64>).
  • Sampling Strategy: To ensure fast random access, sampling offsets are recorded for every k-th integer.
  • Codec Abstraction: The module is codec-agnostic; any codec that implements the Codec trait may be employed.
  • Endianness Management: Endianness is seamlessly handled via phantom types, supporting both big-endian and little-endian variants without affecting performance.

Structs§

BEIntVecIter
Iterator over the values stored in a BEIntVec. The iterator decodes values on the fly.
IntVec
A compressed vector of integers.
LEIntVecIter
Iterator over the values stored in a LEIntVec.

Type Aliases§

BEIntVec
Big-endian variant of IntVec.
LEIntVec
Little-endian variant of IntVec.