Crate vecdb

Source
Expand description

§[vecdb]

A KISS (Keep It Simple, Stupid) index-value storage engine optimized for columnar data with transparent compression support.

§Overview

VecDB is an embedded database engine designed for high-performance columnar storage. It provides vector-like data structures that can be persisted to disk with optional compression, making it ideal for analytical workloads and time-series data.

§Key Features

  • Columnar storage: Optimized for analytical queries and data compression
  • Embedded: No separate server process - runs directly in your application
  • Index-free: Uses array indices as keys, eliminating key storage overhead
  • Value-focused: Only actual values are stored, maximizing space efficiency
  • Dual storage modes: Choose between raw (fast access) or compressed (space efficient) storage
  • Transactional: ACID-compliant operations with proper isolation
  • Multi-reader/writer: Concurrent access support with fine-grained locking
  • Performance-optimized: Non-portable design choices for maximum speed on supported platforms
  • Unix-focused: Primarily designed for Unix-like systems

§Storage Variants

VecDB supports multiple vector implementations for different use cases:

§Raw Vectors (RawVec)

  • Direct, uncompressed storage for maximum read/write speed
  • Ideal for frequently accessed data and real-time applications

§Compressed Vectors (CompressedVec)

  • Advanced compression using pco (Pcodec) for numerical data
  • Significant space savings with acceptable performance trade-offs
  • Perfect for analytical workloads and archival data

§Computed Vectors

  • On-the-fly computation from other vectors
  • Lazy evaluation for derived data sets
  • Support for 1-3 input vector computations

§Eager/Lazy Variants

  • Different loading and caching strategies
  • Optimized for various memory and performance constraints

§Example Usage

§Raw Storage

use std::{path::Path, sync::Arc};
use vecdb::{RawVec, Database, Version};

let database = Database::open(Path::new("data"))?;
let mut vec: RawVec<usize, u32> = RawVec::forced_import(&database, "my_vec", Version::TWO)?;

// Push values
vec.push(42);
vec.push(84);

// Read values
let reader = vec.create_reader();
let value = vec.get_or_read(0, &reader)?; // Returns Result<Option<Cow<u32>>>

// Persist to disk
vec.flush()?;

§Compressed Storage

use vecdb::{CompressedVec, Database, Version};

let database = Database::open(Path::new("data"))?;
let mut vec: CompressedVec<usize, u32> = CompressedVec::forced_import(&database, "compressed_vec", Version::TWO)?;

// Same API as raw vectors, but with compression
vec.push(1000);
vec.flush()?;

§Architecture

VecDB is built on top of SeqDB for low-level storage management and provides:

  • Type-safe interfaces: Generic vector types with compile-time type checking
  • Versioning system: Schema evolution and backward compatibility
  • Stamping mechanism: Track data freshness and updates
  • Hole management: Efficient handling of deleted elements
  • Iterator support: Standard Rust iterator patterns for data access

§Use Cases

  • Time-series databases
  • Analytical data processing
  • Scientific computing datasets
  • Financial market data
  • IoT sensor data storage
  • Any scenario requiring fast columnar access patterns

VecDB excels when you need the performance of in-memory data structures with the durability of persistent storage.

§Examples

§Raw

use std::{borrow::Cow, collections::BTreeSet, fs, path::Path};

use vecdb::{
    AnyStoredVec, AnyVec, CollectableVec, Database, GenericStoredVec, RawVec, Stamp, VecIterator,
    Version,
};

#[allow(clippy::upper_case_acronyms)]
type VEC = RawVec<usize, u32>;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let _ = fs::remove_dir_all("raw");

    let version = Version::TWO;

    let database = Database::open(Path::new("raw"))?;

    {
        let mut vec: VEC = RawVec::forced_import(&database, "vec", version)?;

        (0..21_u32).for_each(|v| {
            vec.push(v);
        });

        let mut iter = vec.into_iter();
        assert!(iter.get(0) == Some(Cow::Borrowed(&0)));
        assert!(iter.get(1) == Some(Cow::Borrowed(&1)));
        assert!(iter.get(2) == Some(Cow::Borrowed(&2)));
        assert!(iter.get(20) == Some(Cow::Borrowed(&20)));
        assert!(iter.get(21).is_none());
        drop(iter);

        vec.flush()?;

        assert!(vec.header().stamp() == Stamp::new(0));
    }

    {
        let mut vec: VEC = RawVec::forced_import(&database, "vec", version)?;

        vec.mut_header().update_stamp(Stamp::new(100));

        assert!(vec.header().stamp() == Stamp::new(100));

        let mut iter = vec.into_iter();
        assert!(iter.get(0) == Some(Cow::Borrowed(&0)));
        assert!(iter.get(1) == Some(Cow::Borrowed(&1)));
        assert!(iter.get(2) == Some(Cow::Borrowed(&2)));
        assert!(iter.get(3) == Some(Cow::Borrowed(&3)));
        assert!(iter.get(4) == Some(Cow::Borrowed(&4)));
        assert!(iter.get(5) == Some(Cow::Borrowed(&5)));
        assert!(iter.get(20) == Some(Cow::Borrowed(&20)));
        assert!(iter.get(20) == Some(Cow::Borrowed(&20)));
        assert!(iter.get(0) == Some(Cow::Borrowed(&0)));
        drop(iter);

        vec.push(21);
        vec.push(22);

        assert!(vec.stored_len() == 21);
        assert!(vec.pushed_len() == 2);
        assert!(vec.len() == 23);

        let mut iter = vec.into_iter();
        assert!(iter.get(20) == Some(Cow::Borrowed(&20)));
        assert!(iter.get(21) == Some(Cow::Borrowed(&21)));
        assert!(iter.get(22) == Some(Cow::Borrowed(&22)));
        assert!(iter.get(23).is_none());
        drop(iter);

        vec.flush()?;
    }

    {
        let mut vec: VEC = RawVec::forced_import(&database, "vec", version)?;

        assert!(vec.header().stamp() == Stamp::new(100));

        assert!(vec.stored_len() == 23);
        assert!(vec.pushed_len() == 0);
        assert!(vec.len() == 23);

        let mut iter = vec.into_iter();
        assert!(iter.get(0) == Some(Cow::Borrowed(&0)));
        assert!(iter.get(20) == Some(Cow::Borrowed(&20)));
        assert!(iter.get(21) == Some(Cow::Borrowed(&21)));
        assert!(iter.get(22) == Some(Cow::Borrowed(&22)));
        drop(iter);

        vec.truncate_if_needed(14)?;

        assert_eq!(vec.stored_len(), 14);
        assert_eq!(vec.pushed_len(), 0);
        assert_eq!(vec.len(), 14);

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(5), Some(Cow::Borrowed(&5)));
        assert_eq!(iter.get(20), None);
        drop(iter);

        assert_eq!(
            vec.collect_signed_range(Some(-5), None)?,
            vec![9, 10, 11, 12, 13]
        );

        vec.push(vec.len() as u32);
        assert_eq!(
            VecIterator::last(vec.into_iter()),
            Some((14, Cow::Borrowed(&14)))
        );

        assert_eq!(
            vec.into_iter()
                .map(|(_, v)| v.into_owned())
                .collect::<Vec<_>>(),
            vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
        );

        vec.flush()?;
    }

    {
        let mut vec: VEC = RawVec::forced_import(&database, "vec", version)?;

        assert_eq!(
            VecIterator::last(vec.into_iter()),
            Some((14, Cow::Borrowed(&14)))
        );

        assert_eq!(
            vec.into_iter()
                .map(|(_, v)| v.into_owned())
                .collect::<Vec<_>>(),
            vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
        );

        vec.reset()?;

        assert_eq!(vec.pushed_len(), 0);
        assert_eq!(vec.stored_len(), 0);
        assert_eq!(vec.len(), 0);

        (0..21_u32).for_each(|v| {
            vec.push(v);
        });

        assert_eq!(vec.pushed_len(), 21);
        assert_eq!(vec.stored_len(), 0);
        assert_eq!(vec.len(), 21);

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert!(iter.get(21).is_none());
        drop(iter);

        let reader = vec.create_static_reader();
        assert_eq!(vec.take(10, &reader)?, Some(10));
        assert_eq!(vec.holes(), &BTreeSet::from([10]));
        assert!(vec.get_or_read(10, &reader)?.is_none());
        drop(reader);

        vec.flush()?;

        assert!(vec.holes() == &BTreeSet::from([10]));
    }

    {
        let mut vec: VEC = RawVec::forced_import(&database, "vec", version)?;

        assert!(vec.holes() == &BTreeSet::from([10]));

        let reader = vec.create_static_reader();
        assert!(vec.get_or_read(10, &reader)?.is_none());
        drop(reader);

        vec.update(10, 10)?;
        vec.update(0, 10)?;

        let reader = vec.create_static_reader();
        assert_eq!(vec.holes(), &BTreeSet::new());
        assert_eq!(vec.get_or_read(0, &reader)?, Some(Cow::Borrowed(&10)));
        assert_eq!(vec.get_or_read(10, &reader)?, Some(Cow::Borrowed(&10)));
        drop(reader);

        vec.flush()?;
    }

    {
        let vec: VEC = RawVec::forced_import(&database, "vec", version)?;

        assert_eq!(
            vec.collect()?,
            vec![
                10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
            ]
        );
    }

    Ok(())
}

§Compressed

use std::{borrow::Cow, collections::BTreeSet, fs, path::Path};

use vecdb::{
    AnyStoredVec, AnyVec, CollectableVec, CompressedVec, Database, GenericStoredVec, Stamp,
    VecIterator, Version,
};

#[allow(clippy::upper_case_acronyms)]
type VEC = CompressedVec<usize, u32>;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let _ = fs::remove_dir_all("compressed");

    let version = Version::TWO;

    let database = Database::open(Path::new("compressed"))?;

    {
        let mut vec: VEC = CompressedVec::forced_import(&database, "vec", version)?;

        (0..21_u32).for_each(|v| {
            vec.push(v);
        });

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(1), Some(Cow::Borrowed(&1)));
        assert_eq!(iter.get(2), Some(Cow::Borrowed(&2)));
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert_eq!(iter.get(21), None);
        drop(iter);

        vec.flush()?;

        assert_eq!(vec.header().stamp(), Stamp::new(0));
    }

    {
        let mut vec: VEC = CompressedVec::forced_import(&database, "vec", version)?;

        vec.mut_header().update_stamp(Stamp::new(100));

        assert!(vec.header().stamp() == Stamp::new(100));

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(1), Some(Cow::Borrowed(&1)));
        assert_eq!(iter.get(2), Some(Cow::Borrowed(&2)));
        assert_eq!(iter.get(3), Some(Cow::Borrowed(&3)));
        assert_eq!(iter.get(4), Some(Cow::Borrowed(&4)));
        assert_eq!(iter.get(5), Some(Cow::Borrowed(&5)));
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        drop(iter);

        vec.push(21);
        vec.push(22);

        assert_eq!(vec.stored_len(), 21);
        assert_eq!(vec.pushed_len(), 2);
        assert_eq!(vec.len(), 23);

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert_eq!(iter.get(21), Some(Cow::Borrowed(&21)));
        assert_eq!(iter.get(22), Some(Cow::Borrowed(&22)));
        assert_eq!(iter.get(23), None);
        drop(iter);

        vec.flush()?;
    }

    {
        let mut vec: VEC = CompressedVec::forced_import(&database, "vec", version)?;

        assert_eq!(vec.header().stamp(), Stamp::new(100));

        assert_eq!(vec.stored_len(), 23);
        assert_eq!(vec.pushed_len(), 0);
        assert_eq!(vec.len(), 23);

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert_eq!(iter.get(21), Some(Cow::Borrowed(&21)));
        assert_eq!(iter.get(22), Some(Cow::Borrowed(&22)));
        drop(iter);

        vec.truncate_if_needed(14)?;

        assert_eq!(vec.stored_len(), 14);
        assert_eq!(vec.pushed_len(), 0);
        assert_eq!(vec.len(), 14);

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(5), Some(Cow::Borrowed(&5)));
        assert_eq!(iter.get(20), None);
        drop(iter);

        assert_eq!(
            vec.collect_signed_range(Some(-5), None)?,
            vec![9, 10, 11, 12, 13]
        );

        vec.push(vec.len() as u32);
        assert_eq!(
            VecIterator::last(vec.into_iter()),
            Some((14, Cow::Borrowed(&14)))
        );

        vec.flush()?;

        assert_eq!(
            vec.into_iter()
                .map(|(_, v)| v.into_owned())
                .collect::<Vec<_>>(),
            vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
        );
    }

    {
        let mut vec: VEC = CompressedVec::forced_import(&database, "vec", version)?;

        assert_eq!(
            vec.into_iter()
                .map(|(_, v)| v.into_owned())
                .collect::<Vec<_>>(),
            vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
        );

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(5), Some(Cow::Borrowed(&5)));
        assert_eq!(iter.get(20), None);
        drop(iter);

        assert_eq!(
            vec.collect_signed_range(Some(-5), None)?,
            vec![10, 11, 12, 13, 14]
        );

        vec.reset()?;

        assert_eq!(vec.pushed_len(), 0);
        assert_eq!(vec.stored_len(), 0);
        assert_eq!(vec.len(), 0);

        (0..21_u32).for_each(|v| {
            vec.push(v);
        });

        assert_eq!(vec.pushed_len(), 21);
        assert_eq!(vec.stored_len(), 0);
        assert_eq!(vec.len(), 21);

        let mut iter = vec.into_iter();
        assert_eq!(iter.get(0), Some(Cow::Borrowed(&0)));
        assert_eq!(iter.get(20), Some(Cow::Borrowed(&20)));
        assert_eq!(iter.get(21), None);
        drop(iter);

        vec.flush()?;
    }

    {
        let mut vec: VEC = CompressedVec::forced_import(&database, "vec", version)?;

        assert_eq!(vec.pushed_len(), 0);
        assert_eq!(vec.stored_len(), 21);
        assert_eq!(vec.len(), 21);

        let reader = vec.create_static_reader();
        assert_eq!(vec.holes(), &BTreeSet::new());
        assert_eq!(vec.get_or_read(0, &reader)?, Some(Cow::Borrowed(&0)));
        assert_eq!(vec.get_or_read(10, &reader)?, Some(Cow::Borrowed(&10)));
        drop(reader);

        vec.flush()?;
    }

    {
        let vec: VEC = CompressedVec::forced_import(&database, "vec", version)?;

        assert!(
            vec.collect()?
                == vec![
                    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                ]
        );
    }

    Ok(())
}

Structs§

CompressedVec
Database
EagerVec
Exit
LazyVecFrom1
LazyVecFrom2
LazyVecFrom3
RawVec
Stamp
Version

Enums§

Computation
ComputedVec
Error
Format
LatentType
StoredVec

Traits§

AnyCloneableIterableVec
AnyCollectableVec
AnyIterableVec
AnyStoredIterableVec
AnyStoredVec
AnyVec
AsInnerSlice
BaseVecIterator
CheckedSub
CollectableVec
FromCoarserIndex
FromInnerSlice
GenericStoredVec
Printable
StoredCompressed
StoredIndex
StoredRaw
TransparentStoredCompressed
VecIterator

Functions§

i64_to_usize

Type Aliases§

AnyBoxedIterableVec
BoxedVecIterator
ComputedVecFrom1
ComputedVecFrom2
ComputedVecFrom3
Result

Derive Macros§

StoredCompressed