<img width="2000" height="524" alt="Social Cover (9)" src="https://github.com/user-attachments/assets/cf66f045-c8be-494b-b696-b8d7e4fb709c" />
<p align="center">
<a href="docs/i18n/README.es.md">πͺπΈ EspaΓ±ol</a>
<a href="docs/i18n/README.fr.md">π«π· FranΓ§ais</a>
<a href="docs/i18n/README.so.md">πΈπ΄ Soomaali</a>
<a href="docs/i18n/README.ar.md">πΈπ¦ Ψ§ΩΨΉΨ±Ψ¨ΩΨ©</a>
<a href="docs/i18n/README.nl.md">π§πͺ/π³π± Nederlands</a>
<a href="docs/i18n/README.kr.md">π°π· νκ΅μ΄</a>
</p>
<p align="center">
<strong>Memvid is a single-file memory layer for AI agents with instant retrieval and long-term memory.</strong><br/>
Persistent, versioned, and portable memory, without databases.
</p>
<p align="center">
<a href="https://www.memvid.com">Website</a>
Β·
<a href="https://sandbox.memvid.com">Try Sandbox</a>
Β·
<a href="https://docs.memvid.com">Docs</a>
Β·
<a href="https://github.com/memvid/memvid/discussions">Discussions</a>
</p>
<p align="center">
<a href="https://crates.io/crates/memvid-core"><img src="https://img.shields.io/crates/v/memvid-core?style=flat-square&logo=rust" alt="Crates.io" /></a>
<a href="https://docs.rs/memvid-core"><img src="https://img.shields.io/docsrs/memvid-core?style=flat-square&logo=docs.rs" alt="docs.rs" /></a>
<a href="https://github.com/memvid/memvid/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square" alt="License" /></a>
</p>
<p align="center">
<a href="https://github.com/memvid/memvid/stargazers"><img src="https://img.shields.io/github/stars/memvid/memvid?style=flat-square&logo=github" alt="Stars" /></a>
<a href="https://github.com/memvid/memvid/network/members"><img src="https://img.shields.io/github/forks/memvid/memvid?style=flat-square&logo=github" alt="Forks" /></a>
<a href="https://github.com/memvid/memvid/issues"><img src="https://img.shields.io/github/issues/memvid/memvid?style=flat-square&logo=github" alt="Issues" /></a>
<a href="https://discord.gg/2mynS7fcK7"><img src="https://img.shields.io/discord/1442910055233224745?style=flat-square&logo=discord&label=discord" alt="Discord" /></a>
</p>
<p align="center">
<a href="https://trendshift.io/repositories/17293" target="_blank"><img src="https://trendshift.io/api/badge/repositories/17293" alt="memvid%2Fmemvid | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/</a>
</p>
<h2 align="center">βοΈ Leave a STAR to support the project βοΈ</h2>
</p>
## What is Memvid?
Memvid is a portable AI memory system that packages your data, embeddings, search structure, and metadata into a single file.
Instead of running complex RAG pipelines or server-based vector databases, Memvid enables fast retrieval directly from the file.
The result is a model-agnostic, infrastructure-free memory layer that gives AI agents persistent, long-term memory they can carry anywhere.
---
## What are Smart Frames?
Memvid draws inspiration from video encoding, not to store video, but to **organize AI memory as an append-only, ultra-efficient sequence of Smart Frames.**
A Smart Frame is an immutable unit that stores content along with timestamps, checksums and basic metadata.
Frames are grouped in a way that allows efficient compression, indexing, and parallel reads.
This frame-based design enables:
- Append-only writes without modifying or corrupting existing data
- Queries over past memory states
- Timeline-style inspection of how knowledge evolves
- Crash safety through committed, immutable frames
- Efficient compression using techniques adapted from video encoding
The result is a single file that behaves like a rewindable memory timeline for AI systems.
---
## Core Concepts
- **Living Memory Engine**
Continuously append, branch, and evolve memory across sessions.
- **Capsule Context (`.mv2`)**
Self-contained, shareable memory capsules with rules and expiry.
- **Time-Travel Debugging**
Rewind, replay, or branch any memory state.
- **Smart Recall**
Sub-5ms local memory access with predictive caching.
- **Codec Intelligence**
Auto-selects and upgrades compression over time.
---
## Use Cases
Memvid is a portable, serverless memory layer that gives AI agents persistent memory and fast recall. Because it's model-agnostic, multi-modal, and works fully offline, developers are using Memvid across a wide range of real-world applications.
- Long-Running AI Agents
- Enterprise Knowledge Bases
- Offline-First AI Systems
- Codebase Understanding
- Customer Support Agents
- Workflow Automation
- Sales and Marketing Copilots
- Personal Knowledge Assistants
- Medical, Legal, and Financial Agents
- Auditable and Debuggable AI Workflows
- Custom Applications
---
## SDKs & CLI
Use Memvid in your preferred language:
| **CLI** | `npm install -g memvid-cli` | [](https://www.npmjs.com/package/memvid-cli) |
| **Node.js SDK** | `npm install @memvid/sdk` | [](https://www.npmjs.com/package/@memvid/sdk) |
| **Python SDK** | `pip install memvid-sdk` | [](https://pypi.org/project/memvid-sdk/) |
| **Rust** | `cargo add memvid-core` | [](https://crates.io/crates/memvid-core) |
---
## Installation (Rust)
### Requirements
- **Rust 1.85.0+** β Install from [rustup.rs](https://rustup.rs)
### Add to Your Project
```toml
[dependencies]
memvid-core = "2.0"
```
### Feature Flags
| `lex` | Full-text search with BM25 ranking (Tantivy) |
| `pdf_extract` | Pure Rust PDF text extraction |
| `vec` | Vector similarity search (HNSW + local text embeddings via ONNX) |
| `clip` | CLIP visual embeddings for image search |
| `whisper` | Audio transcription with Whisper |
| `temporal_track` | Natural language date parsing ("last Tuesday") |
| `parallel_segments` | Multi-threaded ingestion |
| `encryption` | Password-based encryption capsules (.mv2e) |
Enable features as needed:
```toml
[dependencies]
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track"] }
```
---
## Quick Start
```rust
use memvid_core::{Memvid, PutOptions, SearchRequest};
fn main() -> memvid_core::Result<()> {
// Create a new memory file
let mut mem = Memvid::create("knowledge.mv2")?;
// Add documents with metadata
let opts = PutOptions::builder()
.title("Meeting Notes")
.uri("mv2://meetings/2024-01-15")
.tag("project", "alpha")
.build();
mem.put_bytes_with_options(b"Q4 planning discussion...", opts)?;
mem.commit()?;
// Search
let response = mem.search(SearchRequest {
query: "planning".into(),
top_k: 10,
snippet_chars: 200,
..Default::default()
})?;
for hit in response.hits {
println!("{}: {}", hit.title.unwrap_or_default(), hit.text);
}
Ok(())
}
```
---
## Build
Clone the repository:
```bash
git clone https://github.com/memvid/memvid.git
cd memvid
```
Build in debug mode:
```bash
cargo build
```
Build in release mode (optimized):
```bash
cargo build --release
```
Build with specific features:
```bash
cargo build --release --features "lex,vec,temporal_track"
```
---
## Run Tests
Run all tests:
```bash
cargo test
```
Run tests with output:
```bash
cargo test -- --nocapture
```
Run a specific test:
```bash
cargo test test_name
```
Run integration tests only:
```bash
cargo test --test lifecycle
cargo test --test search
cargo test --test mutation
```
---
## Examples
The `examples/` directory contains working examples:
### Basic Usage
Demonstrates create, put, search, and timeline operations:
```bash
cargo run --example basic_usage
```
### PDF Ingestion
Ingest and search PDF documents (uses the "Attention Is All You Need" paper):
```bash
cargo run --example pdf_ingestion
```
### CLIP Visual Search
Image search using CLIP embeddings (requires `clip` feature):
```bash
cargo run --example clip_visual_search --features clip
```
### Whisper Transcription
Audio transcription (requires `whisper` feature):
```bash
cargo run --example test_whisper --features whisper
```
---
## Text Embedding Models
The `vec` feature includes local text embedding support using ONNX models. Before using local text embeddings, you need to download the model files manually.
### Quick Start: BGE-small (Recommended)
Download the default BGE-small model (384 dimensions, fast and efficient):
```bash
mkdir -p ~/.cache/memvid/text-models
# Download ONNX model
curl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/onnx/model.onnx' \
-o ~/.cache/memvid/text-models/bge-small-en-v1.5.onnx
# Download tokenizer
curl -L 'https://huggingface.co/BAAI/bge-small-en-v1.5/resolve/main/tokenizer.json' \
-o ~/.cache/memvid/text-models/bge-small-en-v1.5_tokenizer.json
```
### Available Models
| `bge-small-en-v1.5` | 384 | ~120MB | Default, fast |
| `bge-base-en-v1.5` | 768 | ~420MB | Better quality |
| `nomic-embed-text-v1.5` | 768 | ~530MB | Versatile tasks |
| `gte-large` | 1024 | ~1.3GB | Highest quality |
### Other Models
**BGE-base** (768 dimensions):
```bash
curl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/onnx/model.onnx' \
-o ~/.cache/memvid/text-models/bge-base-en-v1.5.onnx
curl -L 'https://huggingface.co/BAAI/bge-base-en-v1.5/resolve/main/tokenizer.json' \
-o ~/.cache/memvid/text-models/bge-base-en-v1.5_tokenizer.json
```
**Nomic** (768 dimensions):
```bash
curl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/onnx/model.onnx' \
-o ~/.cache/memvid/text-models/nomic-embed-text-v1.5.onnx
curl -L 'https://huggingface.co/nomic-ai/nomic-embed-text-v1.5/resolve/main/tokenizer.json' \
-o ~/.cache/memvid/text-models/nomic-embed-text-v1.5_tokenizer.json
```
**GTE-large** (1024 dimensions):
```bash
curl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/onnx/model.onnx' \
-o ~/.cache/memvid/text-models/gte-large.onnx
curl -L 'https://huggingface.co/thenlper/gte-large/resolve/main/tokenizer.json' \
-o ~/.cache/memvid/text-models/gte-large_tokenizer.json
```
### Usage in Code
```rust
use memvid_core::text_embed::{LocalTextEmbedder, TextEmbedConfig};
use memvid_core::types::embedding::EmbeddingProvider;
// Use default model (BGE-small)
let config = TextEmbedConfig::default();
let embedder = LocalTextEmbedder::new(config)?;
let embedding = embedder.embed_text("hello world")?;
assert_eq!(embedding.len(), 384);
// Use different model
let config = TextEmbedConfig::bge_base();
let embedder = LocalTextEmbedder::new(config)?;
```
See `examples/text_embedding.rs` for a complete example with similarity computation and search ranking.
---
## File Format
Everything lives in a single `.mv2` file:
```
ββββββββββββββββββββββββββββββ
β Header (4KB) β Magic, version, capacity
ββββββββββββββββββββββββββββββ€
β Embedded WAL (1-64MB) β Crash recovery
ββββββββββββββββββββββββββββββ€
β Data Segments β Compressed frames
ββββββββββββββββββββββββββββββ€
β Lex Index β Tantivy full-text
ββββββββββββββββββββββββββββββ€
β Vec Index β HNSW vectors
ββββββββββββββββββββββββββββββ€
β Time Index β Chronological ordering
ββββββββββββββββββββββββββββββ€
β TOC (Footer) β Segment offsets
ββββββββββββββββββββββββββββββ
```
No `.wal`, `.lock`, `.shm`, or sidecar files. Ever.
See [MV2_SPEC.md](MV2_SPEC.md) for the complete file format specification.
---
## Support
Have questions or feedback?
Email: contact@memvid.com
**Drop a β to show support**
---
## License
Apache License 2.0 β see the [LICENSE](LICENSE) file for details.