Skip to main content

Crate cnk

Crate cnk 

Source
Expand description

ID set compression primitives.

cnk provides compression algorithms for sorted, unique ID sets where order doesn’t matter. This is common in information retrieval:

  • IVF posting lists (which vectors belong to which cluster)
  • HNSW neighbor lists (which nodes are connected)
  • Inverted indexes (which documents contain which terms)

§Compression Methods

  • Delta+varint (RocCompressor): practical baseline, varint-encodes gaps between sorted IDs
  • Elias-Fano (feature sbits): succinct monotone-sequence codec with random access
  • Partitioned Elias-Fano (feature sbits): cluster-aware variant
  • ROC (Random Order Coding): near-optimal for sets using bits-back with ANS (planned)

§Historical Context

Set compression has a rich history in information retrieval. Classic methods like Elias-Fano (1971) exploit monotonicity of sorted sequences. Modern methods like ROC (Severo et al., 2022) exploit the additional structure that order doesn’t matter, achieving log(C(N,n)) bits instead of log(N^n) bits.

§Example

use cnk::{RocCompressor, IdSetCompressor};

let compressor = RocCompressor::new();
let ids = vec![1u32, 5, 10, 20, 50];
let universe_size = 1000;

// Compress
let compressed = compressor.compress_set(&ids, universe_size).unwrap();

// Decompress
let decompressed = compressor.decompress_set(&compressed, universe_size).unwrap();
assert_eq!(ids, decompressed);

§References

  • Elias, P. (1974). “Efficient storage and retrieval by content and address”
  • Fano, R. (1971). “On the number of bits required to implement an associative memory”
  • Severo et al. (2022). “Compressing multisets with large alphabets”
  • Severo et al. (2025). “Lossless Compression of Vector IDs for ANN Search”

Structs§

AutoConfig
Configuration for compress_set_auto.
ChooseConfig
Configuration for the heuristic chooser.
CodecChoice
A codec choice, including any method parameters that should be recorded alongside the bytes.
IdListStats
Summary statistics for a sorted, unique ID list within [0, universe_size).
RocCompressor
Delta+varint compressor for sorted, unique ID sets.

Enums§

CompressionError
Errors that can occur during compression operations.
IdCompressionMethod
Compression method selection.

Traits§

IdSetCompressor
Trait for compressing sets of IDs where order doesn’t matter.

Functions§

choose_method
Choose a compression method from list statistics.
compress_set_auto
Choose a method (feature-aware) and compress.
compress_set_enveloped
Encode as an envelope: choose method (auto) + store (method, params, universe_size, payload).
decompress_set_auto
Decompress bytes previously produced by compress_set_auto (or any compatible encoder), using the recorded choice.method.
decompress_set_enveloped
Decode an envelope.
validate_ids
Validate that ids is sorted in strictly ascending order (sorted + unique).