Crate vecdb

Crate vecdb 

Source
Expand description

§vecdb

High-performance mutable persistent vectors built on rawdb.

§Features

  • Vec-like API: push, update, truncate, delete by index with sparse holes
  • Multiple storage formats:
    • Raw: BytesVec, ZeroCopyVec (uncompressed)
    • Compressed: PcoVec, LZ4Vec, ZstdVec
  • Computed vectors: EagerVec (stored computations), LazyVecFrom1/2/3 (on-the-fly computation)
  • Rollback support: Time-travel via stamped change deltas without full snapshots
  • Sparse deletions: Delete elements leaving holes, no reindexing required
  • Thread-safe: Concurrent reads with exclusive writes
  • Blazing fast: See benchmarks
  • Lazy persistence: Changes buffered in memory, persisted only on explicit flush()

§Not Suited For

  • Key-value storage - Use fjall or redb
  • Variable-sized types - Types like String, Vec<T>, or dynamic structures
  • ACID transactions - No transactional guarantees (use explicit rollback instead)

§Install

cargo add vecdb

§Quick Start

use vecdb::{
    AnyStoredVec, BytesVec, Database, GenericStoredVec,
    ImportableVec, Result, Version
};
use std::path::Path;

fn main() -> Result<()> {
    // Open database
    let db = Database::open(Path::new("data"))?;

    // Create vector with index type usize and value type u64
    let mut vec: BytesVec<usize, u64> =
        BytesVec::import(&db, "my_vec", Version::TWO)?;

    // Push values (buffered in memory)
    for i in 0..1_000_000 {
        vec.push(i);
    }

    // Flush writes to rawdb region and syncs to disk
    vec.flush()?;  // Calls write() internally then flushes region
    db.flush()?;   // Syncs database metadata

    // Sequential iteration
    let mut sum = 0u64;
    for value in vec.iter()? {
        sum = sum.wrapping_add(value);
    }

    // Random access
    let reader = vec.create_reader();
    for i in [500, 1000, 10] {
        if let Ok(value) = vec.read_at(i, &reader) {
            println!("vec[{}] = {}", i, value);
        }
    }

    Ok(())
}

§Type Constraints

vecdb works with fixed-size types:

  • Numeric primitives: u8, i32, f64, etc.
  • Fixed arrays: [T; N]
  • Structs with #[repr(C)]
  • Types implementing zerocopy::FromBytes + zerocopy::AsBytes (for ZeroCopyVec)
  • Types implementing Bytes trait (for BytesVec, LZ4Vec, ZstdVec)
  • Numeric types implementing Pco trait (for PcoVec)

Use #[derive(Bytes)] or #[derive(Pco)] from vecdb_derive to enable custom wrapper types.

§Vector Variants

§Raw (Uncompressed)

BytesVec<I, T> - Custom serialization via Bytes trait

use vecdb::{BytesVec, Bytes};

#[derive(Bytes)]
struct UserId(u64);

let mut vec: BytesVec<usize, UserId> =
    BytesVec::import(&db, "users", Version::TWO)?;

ZeroCopyVec<I, T> - Zero-copy mmap access (fastest random reads)

use vecdb::ZeroCopyVec;

let mut vec: ZeroCopyVec<usize, u32> =
    ZeroCopyVec::import(&db, "raw", Version::TWO)?;

§Compressed

PcoVec<I, T> - Pcodec compression (best for numeric data, excellent compression ratios)

use vecdb::PcoVec;

let mut vec: PcoVec<usize, f64> =
    PcoVec::import(&db, "prices", Version::TWO)?;

LZ4Vec<I, T> - LZ4 compression (fast, general-purpose)

use vecdb::LZ4Vec;

let mut vec: LZ4Vec<usize, [u8; 16]> =
    LZ4Vec::import(&db, "hashes", Version::TWO)?;

ZstdVec<I, T> - Zstd compression (high compression ratio, general-purpose)

use vecdb::ZstdVec;

let mut vec: ZstdVec<usize, u64> =
    ZstdVec::import(&db, "data", Version::TWO)?;

§Computed Vectors

EagerVec<V> - Wraps any stored vector to enable eager computation methods

Stores computed results on disk, incrementally updating when source data changes. Use for derived metrics, aggregations, transformations, moving averages, etc.

use vecdb::EagerVec;

let mut derived: EagerVec<BytesVec<usize, f64>> =
    EagerVec::import(&db, "derived", Version::TWO)?;

// Compute methods store results on disk
// derived.compute_add(&source1, &source2)?;
// derived.compute_sma(&source, 20)?;

LazyVecFrom1/2/3<...> - Lazily computed vectors from 1-3 source vectors

Values computed on-the-fly during iteration, nothing stored on disk. Use for temporary views or simple transformations.

use vecdb::LazyVecFrom1;

let lazy = LazyVecFrom1::init(
    "computed",
    Version::TWO,
    source.boxed(),
    |i, source_iter| source_iter.get(i).map(|v| v * 2)
);

// Computed during iteration, not stored
for value in lazy.iter() {
    // ...
}

§Core Operations

§Write and Persistence

// Push values (buffered in memory)
vec.push(42);
vec.push(100);

// write() moves pushed values to storage (visible for reads)
vec.write()?;

// flush() calls write() + region().flush() for durability
vec.flush()?;
db.flush()?;   // Also flush database metadata

§Updates and Deletions

// Update element at index (works on stored data)
vec.update(5, 999)?;

// Delete element (creates a hole at that index)
let reader = vec.create_reader();
vec.take(10, &reader)?;
drop(reader);

// Holes are tracked and can be checked
if vec.holes().contains(&10) {
    println!("Index 10 is a hole");
}

// Reading a hole returns None
let reader = vec.create_reader();
assert_eq!(vec.get_any_or_read(10, &reader)?, None);

§Rollback with Stamps

Rollback uses stamped change deltas - lightweight compared to full snapshots.

use vecdb::Stamp;

// Create initial state
vec.push(100);
vec.push(200);
vec.stamped_write_with_changes(Stamp::new(1))?;

// Make more changes
vec.push(300);
vec.update(0, 999)?;
vec.stamped_write_with_changes(Stamp::new(2))?;

// Rollback to previous stamp (undoes changes from stamp 2)
vec.rollback()?;
assert_eq!(vec.stamp(), Stamp::new(1));

// Rollback before a stamp (undoes everything including stamp 1)
vec.rollback_before(Stamp::new(1))?;
assert_eq!(vec.stamp(), Stamp::new(0));

Configure number of stamps to keep:

let options = (&db, "vec", Version::TWO)
    .into()
    .with_saved_stamped_changes(10);  // Keep last 10 stamps
let vec = BytesVec::import_with(options)?;

§When To Use

Perfect for:

  • Storing large Vecs persistently on disk
  • Append-only or append-mostly workloads
  • High-speed sequential reads
  • High-speed random reads (improved with ZeroCopyVec)
  • Space-efficient storage for numeric time series (improved with PcoVec)
  • Sparse deletions without reindexing
  • Lightweight rollback without full snapshots
  • Derived computations stored on disk (with EagerVec)

Not ideal for:

  • Heavy random write workloads
  • Frequent insertions in the middle
  • Variable-length data (strings, nested vectors)
  • ACID transaction requirements
  • Key-value lookups (use a proper key-value store)

§Feature Flags

No features are enabled by default. Enable only what you need:

cargo add vecdb  # BytesVec only, no compression or optional features

Available features:

  • pco - Pcodec compression support (PcoVec)
  • zerocopy - Zero-copy mmap access (ZeroCopyVec)
  • lz4 - LZ4 compression support (LZ4Vec)
  • zstd - Zstd compression support (ZstdVec)
  • derive - Derive macros for Bytes and Pco traits
  • serde - Serde serialization support
  • serde_json - JSON output using serde_json
  • sonic-rs - Faster JSON using sonic-rs

With Pcodec compression:

cargo add vecdb --features pco,derive

With all compression formats:

cargo add vecdb --features pco,zerocopy,lz4,zstd,derive

§Examples

Comprehensive examples in examples/:

Run examples:

cargo run --example zerocopy --features zerocopy
cargo run --example pcodec --features pco

§Performance

See vecdb_bench for detailed benchmarks.

vecdb is significantly faster than general-purpose embedded databases for fixed-size data workloads.

Structs§

BytesStrategy
Serialization strategy using the Bytes trait with portable byte order.
BytesVec
Raw storage vector using explicit byte serialization in little-endian format.
CleanCompressedVecIterator
Clean compressed vec iterator, for reading stored compressed data Uses dedicated file handle for sequential reads (better OS readahead than mmap)
CleanRawVecIterator
Clean raw vec iterator for reading stored data without holes/updates
Database
Memory-mapped database with dynamic space allocation and hole punching.
DirtyCompressedVecIterator
Dirty compressed vec iterator, handles pushed values on top of stored data
DirtyRawVecIterator
Dirty raw vec iterator, full-featured with holes/updates/pushed support
Divide
(a, b) -> a / b
EagerVec
Wrapper for computing and storing derived values from source vectors.
Exit
Graceful shutdown coordinator for ensuring data consistency during program exit.
Halve
v -> v / 2
Header
Ident
v -> v
ImportOptions
Options for importing or creating stored vectors.
LZ4Strategy
LZ4 compression strategy for fast compression/decompression.
LZ4Vec
Compressed storage using LZ4 for speed-optimized general-purpose compression.
LazyVecFrom1
Lazily computed vector deriving values on-the-fly from one source vector.
LazyVecFrom2
Lazily computed vector deriving values from two source vectors.
LazyVecFrom3
Lazily computed vector deriving values from three source vectors.
LazyVecFrom1Iterator
LazyVecFrom2Iterator
LazyVecFrom3Iterator
Minus
(a, b) -> a - b
Negate
v -> -v
PcoVec
Compressed storage using Pcodec for optimal numeric data compression.
PcodecStrategy
Pcodec compression strategy for numerical data.
Plus
(a, b) -> a + b
RawVecInner
Core implementation for raw storage vectors shared by BytesVec and ZeroCopyVec.
Reader
Zero-copy reader for accessing region data from memory-mapped storage.
SharedLen
Atomic length counter shared across clones.
Stamp
Marker for tracking when data was last modified.
StoredLen
Stored length with rollback support.
Times
(a, b) -> a * b
VecIteratorWriter
Iterator-backed writer that formats values as CSV.
Version
Version tracking for data schema and computed values.
WithPrev
Wrapper that tracks both current and previous values for rollback support.
ZeroCopyStrategy
Serialization strategy using zerocopy for native byte order access.
ZeroCopyVec
Raw storage vector using zerocopy for direct memory mapping in native byte order.
ZstdStrategy
Zstd compression strategy for high compression ratios.
ZstdVec
Compressed storage using Zstd for maximum general-purpose compression.

Enums§

CompressedVecIterator
Automatically selected iterator for compressed vectors based on their state.
Error
Error types for vecdb operations.
Format
Storage format selection for stored vectors.
RawDBError
Error types for rawdb operations.
RawVecIterator
Automatically selected iterator for raw vectors based on their state.

Constants§

PAGE_SIZE

Traits§

AnyCollectableVec
Type-erased trait for collectable vectors.
AnyExportableVec
Type-erased trait for vectors that are both writable and serializable. This trait is automatically implemented for any type that implements both AnyWritableVec and AnySerializableVec.
AnySerializableVec
Type-erased trait for serializable vectors.
AnyStoredVec
Trait for stored vectors that persist data to disk (as opposed to lazy computed vectors).
AnyVec
Common trait for all vectors providing metadata and utility methods.
AnyVecWithSchema
Trait for vectors whose value type implements JsonSchema. Provides access to the JSON Schema of the value type.
AnyVecWithWriter
AsInnerSlice
Convert a slice of PcoVecValue to a slice of the underlying Number type.
BinaryTransform
Trait for binary transforms applied lazily during iteration. Zero-sized types implementing this get monomorphized (zero runtime cost).
Bytes
Trait for types that can be serialized to/from bytes with explicit byte order.
BytesExt
BytesVecValue
Value trait for BytesVec. Extends RawVecValue with Bytes trait for custom serialization.
CheckedSub
CollectableVec
Trait for vectors that can be collected into standard Rust collections with range support.
Formattable
FromCoarserIndex
Maps coarser-grained indices to ranges of finer-grained indices.
FromInnerSlice
Convert a Vec of Number type to a Vec of PcoVecValue.
GenericStoredVec
ImportableVec
Trait for types that can be imported from a database.
IterableCloneableVec
Trait for iterable vectors that can be cloned as trait objects.
IterableStoredVec
Trait combining stored and iterable vector capabilities.
IterableVec
Trait for vectors that can be iterated.
LZ4VecValue
Value trait for LZ4Vec. Extends VecValue with Bytes trait for byte serialization.
Pco
PcoVecValue
PrintableIndex
Provides string representations of index types for display and region naming.
RawStrategy
Serialization strategy for raw storage vectors.
SaturatingAdd
StoredVec
Super trait combining all common stored vec traits.
TransparentPco
TypedVec
A vector with statically-known index and value types.
TypedVecIterator
Extended vector iterator with type-safe index operations.
UnaryTransform
Trait for unary transforms applied lazily during iteration. Zero-sized types implementing this get monomorphized (zero runtime cost).
ValueWriter
Stateful writer for streaming values one at a time to a string buffer.
VecIndex
Trait for types that can be used as vector indices.
VecIterator
Base trait for vector iterators with positioning capabilities.
VecValue
Marker trait for types that can be stored as values in a vector.
ZeroCopyVecValue
Value trait for ZeroCopyVec. Extends RawVecValue with zerocopy bounds for direct memory mapping.
ZstdVecValue
Value trait for ZstdVec. Extends VecValue with Bytes trait for byte serialization.

Functions§

i64_to_usize
Converts an i64 index to usize, supporting negative indexing. Negative indices count from the end.
likely
short_type_name
Extracts the short type name from a full type path and caches it.
unlikely
vec_region_name
Returns the region name for the given vector name.
vec_region_name_with
Returns the region name for the given vector name.

Type Aliases§

BoxedVecIterator
Type alias for boxed vector iterators.
BytesVecIterator
Type alias for BytesVec iterator
CleanBytesVecIterator
Type alias for clean BytesVec iterator
CleanLZ4VecIterator
Type alias for clean LZ4Vec iterator
CleanPcodecVecIterator
Type alias for clean PcodecVec iterator
CleanZeroCopyVecIterator
Type alias for clean ZeroCopyVec iterator
CleanZstdVecIterator
Type alias for clean ZstdVec iterator
ComputeFrom1
ComputeFrom2
ComputeFrom3
DirtyBytesVecIterator
Type alias for dirty BytesVec iterator
DirtyLZ4VecIterator
Type alias for dirty LZ4Vec iterator
DirtyPcodecVecIterator
Type alias for dirty PcodecVec iterator
DirtyZeroCopyVecIterator
Type alias for dirty ZeroCopyVec iterator
DirtyZstdVecIterator
Type alias for dirty ZstdVec iterator
IterableBoxedVec
Type alias for boxed cloneable iterable vectors.
LZ4VecIterator
Type alias for LZ4Vec iterator
PcodecVecIterator
Type alias for PcodecVec iterator
Result
ZeroCopyVecIterator
Type alias for ZeroCopyVec iterator
ZstdVecIterator
Type alias for ZstdVec iterator

Derive Macros§

Bytes
Derives the Bytes trait for single-field tuple structs.
Pco
Derives the Pco trait for single-field tuple structs containing numeric types.