memvid-core 2.0.134

Core library for Memvid v2, a crash-safe, deterministic, single-file AI memory.
Documentation

What is Memvid?

Memvid is a portable AI memory system that packages your data, embeddings, search structure, and metadata into a single file.

Instead of running complex RAG pipelines or server-based vector databases, Memvid enables fast retrieval directly from the file.

The result is a model-agnostic, infrastructure-free memory layer that gives AI agents persistent, long-term memory they can carry anywhere.


What are Smart Frames?

Memvid draws inspiration from video encoding, not to store video, but to organize AI memory as an append-only, ultra-efficient sequence of Smart Frames.

A Smart Frame is an immutable unit that stores content along with timestamps, checksums and basic metadata. Frames are grouped in a way that allows efficient compression, indexing, and parallel reads.

This frame-based design enables:

  • Append-only writes without modifying or corrupting existing data
  • Queries over past memory states
  • Timeline-style inspection of how knowledge evolves
  • Crash safety through committed, immutable frames
  • Efficient compression using techniques adapted from video encoding

The result is a single file that behaves like a rewindable memory timeline for AI systems.


Core Concepts

  • Living Memory Engine Continuously append, branch, and evolve memory across sessions.

  • Capsule Context (.mv2) Self-contained, shareable memory capsules with rules and expiry.

  • Time-Travel Debugging Rewind, replay, or branch any memory state.

  • Smart Recall Sub-5ms local memory access with predictive caching.

  • Codec Intelligence Auto-selects and upgrades compression over time.


Use Cases

Memvid is a portable, serverless memory layer that gives AI agents persistent memory and fast recall. Because it's model-agnostic, multi-modal, and works fully offline, developers are using Memvid across a wide range of real-world applications.

  • Long-Running AI Agents
  • Enterprise Knowledge Bases
  • Offline-First AI Systems
  • Codebase Understanding
  • Customer Support Agents
  • Workflow Automation
  • Sales and Marketing Copilots
  • Personal Knowledge Assistants
  • Medical, Legal, and Financial Agents
  • Auditable and Debuggable AI Workflows
  • Custom Applications

SDKs & CLI

Use Memvid in your preferred language:

Package Install Links
CLI npm install -g memvid-cli npm
Node.js SDK npm install @memvid/sdk npm
Python SDK pip install memvid-sdk PyPI
Rust cargo add memvid-core Crates.io

Installation (Rust)

Requirements

Add to Your Project

[dependencies]
memvid-core = "2.0"

Feature Flags

Feature Description
lex Full-text search with BM25 ranking (Tantivy)
pdf_extract Pure Rust PDF text extraction
vec Vector similarity search (HNSW + local text embeddings via ONNX)
clip CLIP visual embeddings for image search
whisper Audio transcription with Whisper
temporal_track Natural language date parsing ("last Tuesday")
parallel_segments Multi-threaded ingestion
encryption Password-based encryption capsules (.mv2e)

Enable features as needed:

[dependencies]
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track"] }

Quick Start

use memvid_core::{Memvid, PutOptions, SearchRequest};

fn main() -> memvid_core::Result<()> {
    // Create a new memory file
    let mut mem = Memvid::create("knowledge.mv2")?;

    // Add documents with metadata
    let opts = PutOptions::builder()
        .title("Meeting Notes")
        .uri("mv2://meetings/2024-01-15")
        .tag("project", "alpha")
        .build();
    mem.put_bytes_with_options(b"Q4 planning discussion...", opts)?;
    mem.commit()?;

    // Search
    let response = mem.search(SearchRequest {
        query: "planning".into(),
        top_k: 10,
        snippet_chars: 200,
        ..Default::default()
    })?;

    for hit in response.hits {
        println!("{}: {}", hit.title.unwrap_or_default(), hit.text);
    }

    Ok(())
}

Build

Clone the repository:

git clone https://github.com/memvid/memvid.git
cd memvid

Build in debug mode:

cargo build

Build in release mode (optimized):

cargo build --release

Build with specific features:

cargo build --release --features "lex,vec,temporal_track"

Run Tests

Run all tests:

cargo test

Run tests with output:

cargo test -- --nocapture

Run a specific test:

cargo test test_name

Run integration tests only:

cargo test --test lifecycle
cargo test --test search
cargo test --test mutation

Examples

The examples/ directory contains working examples:

Basic Usage

Demonstrates create, put, search, and timeline operations:

cargo run --example basic_usage

PDF Ingestion

Ingest and search PDF documents (uses the "Attention Is All You Need" paper):

cargo run --example pdf_ingestion

CLIP Visual Search

Image search using CLIP embeddings (requires clip feature):

cargo run --example clip_visual_search --features clip

Whisper Transcription

Audio transcription (requires whisper feature):

cargo run --example test_whisper --features whisper

Text Embedding Models

The vec feature includes local text embedding support using ONNX models. Before using local text embeddings, you need to download the model files manually.

Quick Start: BGE-small (Recommended)

Download the default BGE-small model (384 dimensions, fast and efficient):

mkdir -p ~/.cache/memvid/text-models

# Download ONNX model
curl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/bge-small-en-v1.5.onnx

# Download tokenizer
curl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/bge-small-en-v1.5_tokenizer.json

Available Models

Model Dimensions Size Best For
bge-small-en-v1.5 384 ~120MB Default, fast
bge-base-en-v1.5 768 ~420MB Better quality
nomic-embed-text-v1.5 768 ~530MB Versatile tasks
gte-large 1024 ~1.3GB Highest quality

Other Models

BGE-base (768 dimensions):

curl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/bge-base-en-v1.5.onnx
curl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/bge-base-en-v1.5_tokenizer.json

Nomic (768 dimensions):

curl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/nomic-embed-text-v1.5.onnx
curl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/nomic-embed-text-v1.5_tokenizer.json

GTE-large (1024 dimensions):

curl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/gte-large.onnx
curl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/gte-large_tokenizer.json

Usage in Code

use memvid_core::text_embed::{LocalTextEmbedder, TextEmbedConfig};
use memvid_core::types::embedding::EmbeddingProvider;

// Use default model (BGE-small)
let config = TextEmbedConfig::default();
let embedder = LocalTextEmbedder::new(config)?;

let embedding = embedder.embed_text("hello world")?;
assert_eq!(embedding.len(), 384);

// Use different model
let config = TextEmbedConfig::bge_base();
let embedder = LocalTextEmbedder::new(config)?;

See examples/text_embedding.rs for a complete example with similarity computation and search ranking.


File Format

Everything lives in a single .mv2 file:

┌────────────────────────────┐
│ Header (4KB)               │  Magic, version, capacity
├────────────────────────────┤
│ Embedded WAL (1-64MB)      │  Crash recovery
├────────────────────────────┤
│ Data Segments              │  Compressed frames
├────────────────────────────┤
│ Lex Index                  │  Tantivy full-text
├────────────────────────────┤
│ Vec Index                  │  HNSW vectors
├────────────────────────────┤
│ Time Index                 │  Chronological ordering
├────────────────────────────┤
│ TOC (Footer)               │  Segment offsets
└────────────────────────────┘

No .wal, .lock, .shm, or sidecar files. Ever.

See MV2_SPEC.md for the complete file format specification.


Support

Have questions or feedback? Email: contact@memvid.com

Drop a ⭐ to show support


License

Apache License 2.0 — see the LICENSE file for details.