Memvid

The Rust engine behind Memvid. Store everything in a single .mv2 file.

What is this?

Memvid is a library for building AI memory systems. It packs documents, embeddings, full-text search indices, and a write-ahead log into a single portable file. No database setup. No sidecar files. Just one .mv2 file you can copy anywhere.

Why "Frames"?

Memvid borrows from video encoding. Just as video files store sequential frames that can be played, seeked, and edited, Memvid stores your data as an append-only sequence of frames.

Each frame contains your content (text, PDF, image, audio) plus metadata, timestamps, and checksums. Frames group into segments for efficient compression and parallel indexing.

This design gives you:

Append-only simplicity: New data never corrupts existing frames
Time-travel queries: Search your memory as it existed at any point
Timeline playback: Browse frames chronologically like scrubbing through video
Crash safety: Incomplete writes don't affect committed frames

use memvid_core::{Memvid, PutOptions, SearchRequest};

// Create a memory file
let mut mem = Memvid::create("knowledge.mv2")?;

// Add documents with metadata
let opts = PutOptions::builder()
    .title("Meeting Notes")
    .uri("mv2://meetings/2024-01-15")
    .tag("project", "alpha")
    .build();
mem.put_bytes_with_options(b"Q4 planning discussion...", opts)?;
mem.commit()?;

// Search
let response = mem.search(SearchRequest {
    query: "planning".into(),
    top_k: 10,
    snippet_chars: 200,
    ..Default::default()
})?;

for hit in response.hits {
    println!("{}: {}", hit.title.unwrap_or_default(), hit.text);
}

Installation

[dependencies]
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track", "parallel_segments"] }

Features

Feature	Default	What it does
`lex`	yes	Full-text search with BM25 ranking (Tantivy)
`vec`	no	Vector similarity search (HNSW + ONNX embeddings)
`temporal_track`	no	Parse natural language dates ("last Tuesday")
`parallel_segments`	no	Multi-threaded ingestion for large imports
`pdfium`	no	PDF text extraction

Core API

Create and Open

// New file
let mut mem = Memvid::create("data.mv2")?;

// Open existing (read-write)
let mut mem = Memvid::open("data.mv2")?;

// Open read-only (no lock contention)
let mem = Memvid::open_read_only("data.mv2")?;

Put Documents

// Simple
mem.put_bytes(b"Some text content")?;

// With metadata
let opts = PutOptions::builder()
    .title("API Reference")
    .uri("mv2://docs/api")
    .tag("version", "2.0")
    .search_text("custom text for indexing".into())
    .build();
mem.put_bytes_with_options(content, opts)?;

// Don't forget to commit
mem.commit()?;

Search

let response = mem.search(SearchRequest {
    query: "distributed systems".into(),
    top_k: 50,
    snippet_chars: 200,
    scope: Some("mv2://docs/".into()),  // optional: filter by URI prefix
    ..Default::default()
})?;

println!("Found {} results in {}ms", response.total_hits, response.elapsed_ms);

Timeline

Browse documents chronologically:

use std::num::NonZeroU64;

let entries = mem.timeline(TimelineQuery {
    limit: NonZeroU64::new(100),
    since: Some(1706745600),  // Unix timestamp
    until: None,
    reverse: false,
    temporal: None,
})?;

Stats and Verification

let stats = mem.stats()?;
println!("Frames: {}, Lex index: {}", stats.frame_count, stats.has_lex_index);

// Verify integrity
let report = Memvid::verify("data.mv2", true)?;  // true = deep check

File Format

Everything lives in the .mv2 file:

┌────────────────────────────┐
│ Header (4KB)               │  Magic, version, capacity
├────────────────────────────┤
│ Embedded WAL (1-64MB)      │  Crash recovery
├────────────────────────────┤
│ Data Segments              │  Compressed frames
├────────────────────────────┤
│ Lex Index                  │  Tantivy full-text
├────────────────────────────┤
│ Vec Index                  │  HNSW vectors
├────────────────────────────┤
│ Time Index                 │  Chronological ordering
├────────────────────────────┤
│ TOC (Footer)               │  Segment offsets
└────────────────────────────┘

No .wal, .lock, .shm, or any other files. Ever.

See MV2_SPEC.md for the complete file format specification.

Benchmarks

Run on Apple M1 Pro with 50K documents:

Operation	Time
Search (single term)	0.8ms
Search (multi-term)	1.2ms
Cold start + first search	190ms
Concurrent readers (8x)	3.5ms total

Run benchmarks yourself:

cd crates/memvid-core/benchmarks
cargo bench

Examples

cargo run --example basic_usage

See examples/ for more.

Feature Compatibility

Files remember which features were enabled when created. Opening a file requires matching features:

# Check what a file needs
memvid stats file.mv2 --json | jq '.indexes'

If you created a file with the CLI (which enables everything), open it with all features:

memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track"] }

Logging

Uses tracing. Configure in your app:

tracing_subscriber::fmt()
    .with_env_filter("memvid_core=warn")
    .init();

License

Apache 2.0

memvid-core 2.0.130