Crate tensor_compress

Expand description

Tensor-native compression library.

Exploits the mathematical structure of high-dimensional embeddings using Tensor Train decomposition, achieving 10-20x compression for 4096+ dimensions.

§Compression Methods

Tensor Train (TT): Decomposes vectors into products of smaller tensors (recommended)
Sparse: Native format for vectors with >50% zeros (stores only non-zeros)
Delta + varint: Lossless compression for sorted ID sequences
Run-length encoding: Lossless compression for repeated values

Re-exports§

pub use decompose::svd_truncated;
pub use decompose::DecomposeError;
pub use decompose::Matrix;
pub use decompose::SvdResult;
pub use decompose::TensorView;
pub use format::compress_dense_as_sparse;
pub use format::compress_sparse;
pub use format::should_use_sparse;
pub use format::should_use_sparse_threshold;
pub use format::sparse_storage_size;
pub use streaming_tt::convert_vectors_to_streaming_tt;
pub use streaming_tt::read_streaming_tt_all;
pub use streaming_tt::streaming_tt_similarity_search;
pub use streaming_tt::StreamingTTHeader;
pub use streaming_tt::StreamingTTReader;
pub use streaming_tt::StreamingTTWriter;
pub use streaming_tt::STREAMING_TT_MAGIC;
pub use streaming_tt::STREAMING_TT_VERSION;
pub use tensor_train::tt_cosine_similarity;
pub use tensor_train::tt_cosine_similarity_batch;
pub use tensor_train::tt_decompose;
pub use tensor_train::tt_decompose_batch;
pub use tensor_train::tt_dot_product;
pub use tensor_train::tt_dot_product_batch;
pub use tensor_train::tt_euclidean_distance;
pub use tensor_train::tt_euclidean_distance_batch;
pub use tensor_train::tt_norm;
pub use tensor_train::tt_reconstruct;
pub use tensor_train::tt_scale;
pub use tensor_train::TTConfig;
pub use tensor_train::TTCore;
pub use tensor_train::TTError;
pub use tensor_train::TTVector;

Modules§

decompose: Low-level matrix decomposition primitives for tensor operations.
format: Compressed snapshot format for tensor data.
incremental: Incremental (append-only) snapshot format.
streaming: Streaming compression for memory-bounded snapshot I/O.
streaming_tt: Streaming TT decomposition for memory-bounded I/O.
tensor_train: Tensor Train (TT) decomposition for high-dimensional embedding compression.

Structs§

CompressionConfig: Compression configuration for snapshots.
CompressionDefaults: Common embedding dimension constants.
RleEncoded: Run-length encoded data: pairs of (value, count).

Enums§

TensorMode: Tensor compression mode for vectors and embeddings.

Functions§

compress_ids: Combined delta + varint encoding for maximum compression of sorted IDs.
decompress_ids: Decompress delta + varint encoded IDs.
delta_decode: Decode delta-encoded IDs back to original sorted list.
delta_encode: Delta-encode a sorted list of IDs. Stores first value followed by differences between consecutive values.
rle_decode: Decode RLE back to original data.
rle_encode: RLE-encode a slice of values.
varint_decode: Decode variable-length encoded bytes back to u64 values.
varint_encode: Variable-length encode u64 values. Uses 7 bits per byte with high bit as continuation flag.

Crate tensor_compress

Crate tensor_compress Copy item path

§Compression Methods

Re-exports§

Modules§

Structs§

Enums§

Functions§

Crate tensor_compress