Module intvec

Source
Expand description

§Compressed IntVec Module

This module provides a compressed vector of integers that leverages bit-level encoding to efficiently store a sequence of unsigned 64-bit integers.

§Overview

The core data structure, IntVec, maintains a compressed bitstream along with sampling offsets, which enable fast random access to individual elements without the need to decode the entire stream. The module supports two variants based on endianness:

  • Big-Endian (BEIntVec)
  • Little-Endian (LEIntVec)

Both variants work with codecs that implement the Codec trait, allowing flexible and configurable encoding/decoding strategies. Codecs may optionally accept extra runtime parameters to tune the compression.

§Key Features

  • Efficient Storage: Compresses integer sequences into a compact bitstream.
  • Random Access: Uses periodic sampling (every k-th element) to jump-start decompression.
  • Generic Codec Support: Works with any codec implementing the Codec trait.
  • Endian Flexibility: Supports both big-endian and little-endian representations.

§Components

  • IntVec: The main structure containing compressed data, sample offsets, codec parameters, and metadata. You don’t need to interact with this directly.
  • BEIntVec / LEIntVec: Type aliases for endianness-specific versions of IntVec.
  • Iterators: BEIntVecIter and LEIntVecIter decode values on the fly when iterated.

§Usage Examples

§Creating a Big-Endian Compressed Vector

use compressed_intvec::intvec::BEIntVec;
use compressed_intvec::codecs::ExpGolombCodec;

// Define a vector of unsigned 64-bit integers.
let input = vec![1, 5, 3, 1991, 42];

// Create a Big-Endian compressed vector using ExpGolombCodec with a parameter (e.g., 3)
// and sample every 2 elements.
let intvec = BEIntVec::<ExpGolombCodec>::from_with_param(&input, 2, 3);

// Retrieve a specific element by its index.
let value = intvec.get(3);
assert_eq!(value, Some(1991));

// Decode the entire compressed vector back to its original form.
let decoded = intvec.into_vec();
assert_eq!(decoded, input);

§Creating a Little-Endian Compressed Vector

use compressed_intvec::intvec::LEIntVec;
use compressed_intvec::codecs::GammaCodec;

// Define a vector of unsigned 64-bit integers.
let input = vec![10, 20, 30, 40, 50];

// Create a Little-Endian compressed vector using GammaCodec without extra codec parameters,
// sampling every 2 elements.
let intvec = LEIntVec::<GammaCodec>::from(&input, 2);

assert_eq!(intvec.get(2), Some(30));

§Design Details

  • Bitstream Storage: The compressed data is stored as a vector of 64-bit words (Vec<u64>).
  • Sampling Strategy: To support fast random access, sample offsets (in bits) are stored for every k-th integer.
  • Codec Abstraction: The module is codec-agnostic; any codec conforming to the Codec trait can be used.
  • Endian Handling: The endianness of the encoding/decoding process is managed through phantom types, enabling both big-endian and little-endian variants.

§Module Structure and Extensibility

The module’s API provides constructors (from_with_param and from), element access (get), full decoding (into_vec), and iteration (iter). It can be extended with new codecs by implementing the Codec trait for additional compression methods or parameters.

§Error Handling

The current implementation assumes that errors in encoding/decoding are exceptional and uses .unwrap() in places where failure is unexpected. For production code, you might consider propagating errors instead of panicking.

§Getting Started

  1. Choose or implement a codec that satisfies the Codec trait requirements.
  2. Use the provided constructors to compress a vector of integers.
  3. Leverage the efficient sampling mechanism for fast random access, or decode the full content when needed.

For more details, refer to the documentation of the Codec trait and the respective codec implementations.

Structs§

BEIntVecIter
Iterator over the values stored in a BEIntVec. The iterator decodes values on the fly.
IntVec
A compressed vector of integers.
LEIntVecIter
Iterator over the values stored in a LEIntVec.

Type Aliases§

BEIntVec
Big-endian variant of IntVec.
LEIntVec
Little-endian variant of IntVec.