Module intvec

Source
Expand description

§A compressed, randomly accessible vector of u64 integers.

This module provides the core implementation of IntVec, a data structure designed for space-efficient storage and fast random access of u64 integer sequences. It achieves compression by leveraging a variety of instantaneous codes from the dsi-bitstream crate, which encode integers into a variable-length bitstream.

§Core Functionality

  • Compression: Employs codecs like Gamma (γ), Delta (δ), and Zeta (ζ) for skewed data, and a highly efficient FixedLength encoding for uniform data with a small range
  • Random Access: For variable-length codes, it uses a sampling mechanism to provide fast random access. The sampling rate, k, determines the trade-off between access speed and memory overhead. For FixedLength encoding, access is a true O(1) operation.
  • Flexible Construction: Provides a builder API that can construct an IntVec from a slice (with automatic codec selection) or an iterator (for large datasets, requiring manual parameter specification).
  • High-Performance Lookups: Offers optimized methods for various access patterns, including a reusable IntVecReader for dynamic lookups, and efficient batch methods like get_many and par_get_many.

The main struct, IntVec, is generic over Endianness, allowing to choose between Little-Endian (LEIntVec) and Big-Endian (BEIntVec) representations to optimize for specific hardware architectures.

§Example

use compressed_intvec::prelude::*;

// A small vector of integers to be compressed.
let data: &[u64] = &[40, 200, 0, 50, 13, 90, 1023];

// Use the builder to create an IntVec.
// `CodecSpec::Auto` will analyze the data and select the best codec.
let intvec = LEIntVec::builder(data)
    .k(2) // Use a small sampling rate for this vector.
    .codec(CodecSpec::Auto)
    .build()
.unwrap();

// Verify the length and access some elements.
assert_eq!(intvec.len(), data.len());
assert_eq!(intvec.get(1), Some(200));
assert_eq!(intvec.get(6), Some(1023));

Or alternatively, we can use a fixed-length encoding:

use compressed_intvec::prelude::*;

// A small vector of integers to be compressed.
let data: &[u64] = &[40, 200, 0, 50, 13, 90, 1023];

// Use the builder to create an IntVec with fixed-length encoding.
// Using `None` for `num_bits` will automatically select the best bit width (in this case, 10 bits).
let intvec = LEIntVec::builder(data)
   .codec(CodecSpec::FixedLength { num_bits: None })
   .build()
   .unwrap();

// Verify the length and access some elements.
assert_eq!(intvec.len(), data.len());
assert_eq!(intvec.get(1), Some(200));
assert_eq!(intvec.get(6), Some(1023));

Structs§

IntVec
A compressed, randomly accessible vector of u64 integers.
IntVecBuilder
A builder for creating an IntVec from a slice (&[u64]).
IntVecFromIterBuilder
A builder for creating an IntVec from an iterator.
IntVecIter
An iterator over the decompressed u64 values of an IntVec.
IntVecReader
A stateful reader for an IntVec that provides fast random access.

Enums§

IntVecError
Defines the set of errors that can occur in IntVec operations.

Type Aliases§

BEIntVec
A type alias for an IntVec with Big-Endian (BE) bitstream encoding.
LEIntVec
A type alias for an IntVec with Little-Endian (LE) bitstream encoding.