Skip to main content

Crate tensor_compress

Crate tensor_compress 

Source
Expand description

Tensor-native compression library.

Exploits the mathematical structure of high-dimensional embeddings using Tensor Train decomposition, achieving 10-20x compression for 4096+ dimensions.

§Compression Methods

  • Tensor Train (TT): Decomposes vectors into products of smaller tensors (recommended)
  • Sparse: Native format for vectors with >50% zeros (stores only non-zeros)
  • Delta + varint: Lossless compression for sorted ID sequences
  • Run-length encoding: Lossless compression for repeated values

Re-exports§

pub use decompose::svd_truncated;
pub use decompose::DecomposeError;
pub use decompose::Matrix;
pub use decompose::SvdResult;
pub use decompose::TensorView;
pub use format::compress_dense_as_sparse;
pub use format::compress_sparse;
pub use format::should_use_sparse;
pub use format::should_use_sparse_threshold;
pub use format::sparse_storage_size;
pub use streaming_tt::convert_vectors_to_streaming_tt;
pub use streaming_tt::read_streaming_tt_all;
pub use streaming_tt::StreamingTTHeader;
pub use streaming_tt::StreamingTTReader;
pub use streaming_tt::StreamingTTWriter;
pub use streaming_tt::STREAMING_TT_MAGIC;
pub use streaming_tt::STREAMING_TT_VERSION;
pub use tensor_train::tt_cosine_similarity;
pub use tensor_train::tt_cosine_similarity_batch;
pub use tensor_train::tt_decompose;
pub use tensor_train::tt_decompose_batch;
pub use tensor_train::tt_dot_product;
pub use tensor_train::tt_dot_product_batch;
pub use tensor_train::tt_euclidean_distance;
pub use tensor_train::tt_euclidean_distance_batch;
pub use tensor_train::tt_norm;
pub use tensor_train::tt_reconstruct;
pub use tensor_train::tt_scale;
pub use tensor_train::TTConfig;
pub use tensor_train::TTCore;
pub use tensor_train::TTError;
pub use tensor_train::TTVector;

Modules§

decompose
Low-level matrix decomposition primitives for tensor operations.
format
Compressed snapshot format for tensor data.
incremental
Incremental (append-only) snapshot format.
streaming
Streaming compression for memory-bounded snapshot I/O.
streaming_tt
Streaming TT decomposition for memory-bounded I/O.
tensor_train
Tensor Train (TT) decomposition for high-dimensional embedding compression.

Structs§

CompressionConfig
Compression configuration for snapshots.
CompressionDefaults
Common embedding dimension constants.
RleEncoded
Run-length encoded data: pairs of (value, count).

Enums§

TensorMode
Tensor compression mode for vectors and embeddings.

Functions§

compress_ids
Combined delta + varint encoding for maximum compression of sorted IDs.
decompress_ids
Decompress delta + varint encoded IDs.
delta_decode
Decode delta-encoded IDs back to original sorted list.
delta_encode
Delta-encode a sorted list of IDs. Stores first value followed by differences between consecutive values.
rle_decode
Decode RLE back to original data.
rle_encode
RLE-encode a slice of values.
varint_decode
Decode variable-length encoded bytes back to u64 values.
varint_encode
Variable-length encode u64 values. Uses 7 bits per byte with high bit as continuation flag.