memvid-core 2.0.134

What is Memvid?

Memvid is a portable AI memory system that packages your data, embeddings, search structure, and metadata into a single file.

Instead of running complex RAG pipelines or server-based vector databases, Memvid enables fast retrieval directly from the file.

The result is a model-agnostic, infrastructure-free memory layer that gives AI agents persistent, long-term memory they can carry anywhere.

What are Smart Frames?

Memvid draws inspiration from video encoding, not to store video, but to organize AI memory as an append-only, ultra-efficient sequence of Smart Frames.

A Smart Frame is an immutable unit that stores content along with timestamps, checksums and basic metadata. Frames are grouped in a way that allows efficient compression, indexing, and parallel reads.

This frame-based design enables:

Append-only writes without modifying or corrupting existing data
Queries over past memory states
Timeline-style inspection of how knowledge evolves
Crash safety through committed, immutable frames
Efficient compression using techniques adapted from video encoding

The result is a single file that behaves like a rewindable memory timeline for AI systems.

Core Concepts

Living Memory Engine Continuously append, branch, and evolve memory across sessions.
Capsule Context (.mv2) Self-contained, shareable memory capsules with rules and expiry.
Time-Travel Debugging Rewind, replay, or branch any memory state.
Smart Recall Sub-5ms local memory access with predictive caching.
Codec Intelligence Auto-selects and upgrades compression over time.

Use Cases

Memvid is a portable, serverless memory layer that gives AI agents persistent memory and fast recall. Because it's model-agnostic, multi-modal, and works fully offline, developers are using Memvid across a wide range of real-world applications.

Long-Running AI Agents
Enterprise Knowledge Bases
Offline-First AI Systems
Codebase Understanding
Customer Support Agents
Workflow Automation
Sales and Marketing Copilots
Personal Knowledge Assistants
Medical, Legal, and Financial Agents
Auditable and Debuggable AI Workflows
Custom Applications

SDKs & CLI

Use Memvid in your preferred language:

Package	Install	Links
CLI	`npm install -g memvid-cli`
Node.js SDK	`npm install @memvid/sdk`
Python SDK	`pip install memvid-sdk`
Rust	`cargo add memvid-core`

Installation (Rust)

Requirements

Rust 1.85.0+ — Install from rustup.rs

Add to Your Project

[dependencies]
memvid-core = "2.0"

Feature Flags

Feature	Description
`lex`	Full-text search with BM25 ranking (Tantivy)
`pdf_extract`	Pure Rust PDF text extraction
`vec`	Vector similarity search (HNSW + local text embeddings via ONNX)
`clip`	CLIP visual embeddings for image search
`whisper`	Audio transcription with Whisper
`temporal_track`	Natural language date parsing ("last Tuesday")
`parallel_segments`	Multi-threaded ingestion
`encryption`	Password-based encryption capsules (.mv2e)

Enable features as needed:

[dependencies]
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track"] }

Quick Start

use memvid_core::{Memvid, PutOptions, SearchRequest};

fn main() -> memvid_core::Result<()> {
    // Create a new memory file
    let mut mem = Memvid::create("knowledge.mv2")?;

    // Add documents with metadata
    let opts = PutOptions::builder()
        .title("Meeting Notes")
        .uri("mv2://meetings/2024-01-15")
        .tag("project", "alpha")
        .build();
    mem.put_bytes_with_options(b"Q4 planning discussion...", opts)?;
    mem.commit()?;

    // Search
    let response = mem.search(SearchRequest {
        query: "planning".into(),
        top_k: 10,
        snippet_chars: 200,
        ..Default::default()
    })?;

    for hit in response.hits {
        println!("{}: {}", hit.title.unwrap_or_default(), hit.text);
    }

    Ok(())
}

Build

Clone the repository:

git clone https://github.com/memvid/memvid.git
cd memvid

Build in debug mode:

cargo build

Build in release mode (optimized):

cargo build --release

Build with specific features:

cargo build --release --features "lex,vec,temporal_track"

Run Tests

Run all tests:

cargo test

Run tests with output:

cargo test -- --nocapture

Run a specific test:

cargo test test_name

Run integration tests only:

cargo test --test lifecycle
cargo test --test search
cargo test --test mutation

Examples

The examples/ directory contains working examples:

Basic Usage

Demonstrates create, put, search, and timeline operations:

cargo run --example basic_usage

PDF Ingestion

Ingest and search PDF documents (uses the "Attention Is All You Need" paper):

cargo run --example pdf_ingestion

CLIP Visual Search

Image search using CLIP embeddings (requires clip feature):

cargo run --example clip_visual_search --features clip

Whisper Transcription

Audio transcription (requires whisper feature):

cargo run --example test_whisper --features whisper

Text Embedding Models

The vec feature includes local text embedding support using ONNX models. Before using local text embeddings, you need to download the model files manually.

Quick Start: BGE-small (Recommended)

Download the default BGE-small model (384 dimensions, fast and efficient):

mkdir -p ~/.cache/memvid/text-models

# Download ONNX model
curl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/bge-small-en-v1.5.onnx

# Download tokenizer
curl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/bge-small-en-v1.5_tokenizer.json

Available Models

Model	Dimensions	Size	Best For
`bge-small-en-v1.5`	384	~120MB	Default, fast
`bge-base-en-v1.5`	768	~420MB	Better quality
`nomic-embed-text-v1.5`	768	~530MB	Versatile tasks
`gte-large`	1024	~1.3GB	Highest quality

Other Models

BGE-base (768 dimensions):

curl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/bge-base-en-v1.5.onnx
curl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/bge-base-en-v1.5_tokenizer.json

Nomic (768 dimensions):

curl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/nomic-embed-text-v1.5.onnx
curl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/nomic-embed-text-v1.5_tokenizer.json

GTE-large (1024 dimensions):

curl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/onnx/model.onnx' \
  -o ~/.cache/memvid/text-models/gte-large.onnx
curl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/tokenizer.json' \
  -o ~/.cache/memvid/text-models/gte-large_tokenizer.json

Usage in Code

use memvid_core::text_embed::{LocalTextEmbedder, TextEmbedConfig};
use memvid_core::types::embedding::EmbeddingProvider;

// Use default model (BGE-small)
let config = TextEmbedConfig::default();
let embedder = LocalTextEmbedder::new(config)?;

let embedding = embedder.embed_text("hello world")?;
assert_eq!(embedding.len(), 384);

// Use different model
let config = TextEmbedConfig::bge_base();
let embedder = LocalTextEmbedder::new(config)?;

See examples/text_embedding.rs for a complete example with similarity computation and search ranking.

File Format

Everything lives in a single .mv2 file:

┌────────────────────────────┐
│ Header (4KB)               │  Magic, version, capacity
├────────────────────────────┤
│ Embedded WAL (1-64MB)      │  Crash recovery
├────────────────────────────┤
│ Data Segments              │  Compressed frames
├────────────────────────────┤
│ Lex Index                  │  Tantivy full-text
├────────────────────────────┤
│ Vec Index                  │  HNSW vectors
├────────────────────────────┤
│ Time Index                 │  Chronological ordering
├────────────────────────────┤
│ TOC (Footer)               │  Segment offsets
└────────────────────────────┘

No .wal, .lock, .shm, or sidecar files. Ever.

See MV2_SPEC.md for the complete file format specification.

Support

Have questions or feedback? Email: contact@memvid.com

Drop a ⭐ to show support

License

Apache License 2.0 — see the LICENSE file for details.