Crate tensor_store

Expand description

TensorStore - Unified Storage Layer for Neumann

A thread-safe, sharded key-value store optimized for tensor data:

Dense and sparse vector storage with HNSW indexing
Relational tables with SIMD-accelerated filtering
Graph structures with CSR-optimized traversal
Automatic hot/cold tiering with mmap backing

§Architecture

TensorStore
  +-- SlabRouter (key classification and routing)
  |     +-- MetadataSlab (arbitrary key-value)
  |     +-- EmbeddingSlab (dense embeddings)
  |     +-- RelationalSlab (columnar tables)
  |     +-- GraphTensor (nodes and edges)
  +-- HNSWIndex (similarity search)
  +-- EntityIndex (string <-> ID mapping)
  +-- CacheRing (LRU/LFU eviction)
  +-- TieredStore (hot/cold storage)

§Quick Start

use tensor_store::{TensorStore, TensorData, TensorValue, ScalarValue};

let store = TensorStore::new();

// Store a tensor entity
let mut data = TensorData::new();
data.set("name", TensorValue::Scalar(ScalarValue::String("example".into())));
data.set("embedding", TensorValue::Vector(vec![0.1, 0.2, 0.3]));
store.put("entity:1", data).unwrap();

// Retrieve it
let retrieved = store.get("entity:1").unwrap();
assert!(retrieved.has("name"));
assert!(retrieved.has("embedding"));

§Thread Safety

All types use parking_lot locks (no lock poisoning) and sharded designs for high concurrent throughput. Typical performance:

PUT: ~3.2M ops/sec
GET: ~5M ops/sec

§Module Overview

Module	Purpose
`slab_router`	Key routing and WAL durability
`hnsw`	Hierarchical Navigable Small World index
`sparse_vector`	Memory-efficient sparse embeddings
`delta_vector`	Archetype-based delta compression
`relational_slab`	Column-oriented table storage
`graph_tensor`	CSR graph with BFS/shortest path
`cache_ring`	Fixed-size eviction cache
`tiered`	Hot/cold storage with auto-migration
`mmap`	Memory-mapped cold storage
`consistent_hash`	Partition routing with virtual nodes

Re-exports§

pub use binary_quantization::BinaryThreshold;
pub use binary_quantization::BinaryVector;
pub use blob_log::BlobLog;
pub use blob_log::BlobLogSnapshot;
pub use blob_log::ChunkHash;
pub use cache_ring::CacheRing;
pub use cache_ring::CacheRingSnapshot;
pub use cache_ring::CacheStats;
pub use cache_ring::EvictionScorer;
pub use cache_ring::EvictionStrategy;
pub use consistent_hash::ConsistentHashConfig;
pub use consistent_hash::ConsistentHashPartitioner;
pub use consistent_hash::ConsistentHashStats;
pub use delta_vector::ArchetypeRegistry;
pub use delta_vector::CoverageStats;
pub use delta_vector::DeltaVector;
pub use delta_vector::DeltaVectorError;
pub use delta_vector::KMeans;
pub use delta_vector::KMeansConfig;
pub use delta_vector::KMeansInit;
pub use delta_vector::MAX_DIMENSION as DELTA_MAX_DIMENSION;
pub use distance::DistanceMetric;
pub use distance::GeometricConfig;
pub use durable_blob_log::BlobWalRecord;
pub use durable_blob_log::ChunkLocation;
pub use durable_blob_log::DurableBlobLog;
pub use durable_blob_log::DurableBlobLogConfig;
pub use durable_blob_log::DurableBlobLogError;
pub use durable_blob_log::DurableChunkHash;
pub use embedding_slab::CompressedEmbedding;
pub use embedding_slab::EmbeddingError;
pub use embedding_slab::EmbeddingSlab;
pub use embedding_slab::EmbeddingSlabSnapshot;
pub use embedding_slab::EmbeddingSlot;
pub use entity_index::EntityId;
pub use entity_index::EntityIndex;
pub use entity_index::EntityIndexConfig;
pub use entity_index::EntityIndexError;
pub use entity_index::EntityIndexSnapshot;
pub use entity_index::DEFAULT_MAX_ENTITIES;
pub use graph_tensor::EdgeId;
pub use graph_tensor::GraphTensor;
pub use graph_tensor::GraphTensorSnapshot;
pub use hnsw::EmbeddingStorage;
pub use hnsw::EmbeddingStorageError;
pub use hnsw::HNSWConfig;
pub use hnsw::HNSWDistanceMetric;
pub use hnsw::HNSWIndex;
pub use hnsw::HNSWMemoryStats;
pub use hnsw::ScalarQuantizedVector;
pub use instrumentation::HNSWAccessStats;
pub use instrumentation::HNSWStatsSnapshot;
pub use instrumentation::ShardAccessSnapshot;
pub use instrumentation::ShardAccessTracker;
pub use instrumentation::ShardStatsSnapshot;
pub use ivf::IVFConfig;
pub use ivf::IVFIndex;
pub use ivf::IVFIndexState;
pub use ivf::IVFStorage;
pub use metadata_slab::MetadataSlab;
pub use metadata_slab::MetadataSlabSnapshot;
pub use mmap::MmapError;
pub use mmap::MmapStore;
pub use mmap::MmapStoreBuilder;
pub use mmap::MmapStoreMut;
pub use mmap_regional::CompactionStats;
pub use mmap_regional::RegionalMmapConfig;
pub use mmap_regional::RegionalMmapError;
pub use mmap_regional::RegionalMmapStore;
pub use mmap_regional::SortedRunBuilder;
pub use partitioned::PartitionedError;
pub use partitioned::PartitionedGet;
pub use partitioned::PartitionedPut;
pub use partitioned::PartitionedResult;
pub use partitioned::PartitionedStore;
pub use partitioner::PartitionId;
pub use partitioner::PartitionResult;
pub use partitioner::Partitioner;
pub use partitioner::PhysicalNodeId;
pub use pq::ADCTable;
pub use pq::PQCodebook;
pub use pq::PQConfig;
pub use pq::PQVector;
pub use relational_slab::ColumnDef;
pub use relational_slab::ColumnType;
pub use relational_slab::ColumnValue;
pub use relational_slab::RangeOp;
pub use relational_slab::RelationalError;
pub use relational_slab::RelationalSlab;
pub use relational_slab::RelationalSlabSnapshot;
pub use relational_slab::Row;
pub use relational_slab::RowId;
pub use relational_slab::TableSchema;
pub use semantic_partitioner::EncodedEmbedding;
pub use semantic_partitioner::RoutingMethod;
pub use semantic_partitioner::SemanticPartitionResult;
pub use semantic_partitioner::SemanticPartitioner;
pub use semantic_partitioner::SemanticPartitionerConfig;
pub use semantic_partitioner::SemanticPartitionerStats;
pub use slab_router::SlabRouter;
pub use slab_router::SlabRouterConfig;
pub use slab_router::SlabRouterError;
pub use slab_router::SlabRouterSnapshot;
pub use snapshot::detect_version as snapshot_detect_version;
pub use snapshot::load as snapshot_load;
pub use snapshot::migrate_v2_to_v3 as snapshot_migrate;
pub use snapshot::save_v3 as snapshot_save;
pub use snapshot::HNSWNodeSnapshot;
pub use snapshot::HNSWSnapshot;
pub use snapshot::SnapshotFormatError;
pub use snapshot::SnapshotHeader;
pub use snapshot::SnapshotVersion;
pub use snapshot::V3Snapshot;
pub use snapshot::VoronoiPartitionerConfigSnapshot;
pub use snapshot::VoronoiSnapshot;
pub use sparse_vector::SparseVector;
pub use sparse_vector::SparseVectorError;
pub use sparse_vector::MAX_DIMENSION as SPARSE_MAX_DIMENSION;
pub use tiered::MigrationStrategy;
pub use tiered::TieredConfig;
pub use tiered::TieredError;
pub use tiered::TieredStats;
pub use tiered::TieredStore;
pub use voronoi::LocalityKey;
pub use voronoi::LocalityKeyGenerator;
pub use voronoi::VoronoiPartitioner;
pub use voronoi::VoronoiPartitionerConfig;
pub use voronoi::VoronoiRegion;
pub use wal::SyncMode;
pub use wal::TensorWal;
pub use wal::WalConfig;
pub use wal::WalEntry;
pub use wal::WalError;
pub use wal::WalRecovery;
pub use wal::WalResult;
pub use wal::WalStatus;

Modules§

binary_quantization: Binary Quantization for extreme vector compression.
blob_log: Append-only blob log with segment management.
cache_ring: Fixed-size cache ring with configurable eviction strategies.
consistent_hash: Consistent hash ring partitioner with virtual nodes.
delta_vector: Delta-encoded vectors for efficient storage of clustered embeddings.
distance: Distance metrics for geometric vector operations.
durable_blob_log: Durable blob log with WAL-based crash recovery.
embedding_slab: Dense embedding storage with chunked allocation.
entity_index: Vocabulary-based entity index for O(log n) lookup with stable IDs.
fields: Reserved field prefixes for unified entity storage.
graph_tensor: CSR-based graph storage with append log.
hnsw: HNSW (Hierarchical Navigable Small World) index for approximate nearest neighbor search.
instrumentation: Memory instrumentation for tracking shard and node access patterns.
ivf: IVF (Inverted File Index) for large-scale partitioned search.
metadata_slab: Sharded BTreeMap-based metadata storage slab.
mmap: Memory-mapped cold storage for tensor data.
mmap_regional: Region-aware memory-mapped storage for geometric locality.
partitioned: Partition-aware store wrapper for distributed operations.
partitioner: Data partitioning traits for distributed storage.
pq: Product Quantization for memory-efficient vector storage.
relational_slab: Columnar storage for relational data.
semantic_partitioner: Semantic partitioner for embedding-based data distribution.
slab_router: Slab router for directing operations to specialized storage backends.
snapshot: Snapshot format v2/v3 with backward compatibility.
sparse_vector: Sparse Vector - Storage where zero doesn’t exist
tiered: Two-tier hot/cold storage with automatic data migration.
voronoi: Voronoi partitioner with explicit geometric region boundaries.
wal: Write-Ahead Log for crash recovery.

Structs§

BloomFilter: Thread-safe Bloom filter for fast negative lookups.
EntityStore: Unified entity store that provides a shared storage layer for all engines.
TensorData: An entity that can hold scalar properties, vector embeddings, and pointers to other tensors.
TensorStore: Thread-safe key-value store backed by SlabRouter.

Enums§

ScalarValue: Scalar value types for entity properties.
SnapshotError: Errors that can occur during snapshot operations.
TensorStoreError: Errors that can occur during tensor store operations.
TensorValue: Represents different types of values a tensor can hold

Constants§

DEFAULT_SPARSITY_THRESHOLD: Default sparsity threshold for auto-sparsification (70%)
DEFAULT_VALUE_THRESHOLD: Default value threshold for pruning small values

Type Aliases§

Result: Result type for tensor store operations.
SnapshotResult: Result type for snapshot operations.

Crate tensor_store

Crate tensor_store Copy item path

§Architecture

§Quick Start

§Thread Safety

§Module Overview

Re-exports§

Modules§

Structs§

Enums§

Constants§

Type Aliases§

Crate tensor_store