memvid-core 2.0.113

Core library for Memvid v2, a crash-safe, deterministic, single-file AI memory.
Documentation
# Memvid

The Rust engine behind Memvid. Store everything in a single `.mv2` file.

[![Crates.io](https://img.shields.io/crates/v/memvid-core.svg)](https://crates.io/crates/memvid-core)
[![Documentation](https://docs.rs/memvid-core/badge.svg)](https://docs.rs/memvid-core)

## What is this?

`Memvid` is a library for building AI memory systems. It packs documents, embeddings, full-text search indices, and a write-ahead log into a single portable file. No database setup. No sidecar files. Just one `.mv2` file you can copy anywhere.

## Why "Frames"?

Memvid borrows from video encoding. Just as video files store sequential frames that can be played, seeked, and edited, Memvid stores your data as an append-only sequence of **frames**.

Each frame contains your content (text, PDF, image, audio) plus metadata, timestamps, and checksums. Frames group into segments for efficient compression and parallel indexing.

This design gives you:

- **Append-only simplicity**: New data never corrupts existing frames
- **Time-travel queries**: Search your memory as it existed at any point
- **Timeline playback**: Browse frames chronologically like scrubbing through video
- **Crash safety**: Incomplete writes don't affect committed frames

```rust
use memvid_core::{Memvid, PutOptions, SearchRequest};

// Create a memory file
let mut mem = Memvid::create("knowledge.mv2")?;

// Add documents with metadata
let opts = PutOptions::builder()
    .title("Meeting Notes")
    .uri("mv2://meetings/2024-01-15")
    .tag("project", "alpha")
    .build();
mem.put_bytes_with_options(b"Q4 planning discussion...", opts)?;
mem.commit()?;

// Search
let response = mem.search(SearchRequest {
    query: "planning".into(),
    top_k: 10,
    snippet_chars: 200,
    ..Default::default()
})?;

for hit in response.hits {
    println!("{}: {}", hit.title.unwrap_or_default(), hit.text);
}
```

## Installation

```toml
[dependencies]
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track", "parallel_segments"] }
```

### Features

| Feature | Default | What it does |
|---------|---------|--------------|
| `lex` | yes | Full-text search with BM25 ranking (Tantivy) |
| `vec` | no | Vector similarity search (HNSW + ONNX embeddings) |
| `temporal_track` | no | Parse natural language dates ("last Tuesday") |
| `parallel_segments` | no | Multi-threaded ingestion for large imports |
| `pdfium` | no | PDF text extraction |

## Core API

### Create and Open

```rust
// New file
let mut mem = Memvid::create("data.mv2")?;

// Open existing (read-write)
let mut mem = Memvid::open("data.mv2")?;

// Open read-only (no lock contention)
let mem = Memvid::open_read_only("data.mv2")?;
```

### Put Documents

```rust
// Simple
mem.put_bytes(b"Some text content")?;

// With metadata
let opts = PutOptions::builder()
    .title("API Reference")
    .uri("mv2://docs/api")
    .tag("version", "2.0")
    .search_text("custom text for indexing".into())
    .build();
mem.put_bytes_with_options(content, opts)?;

// Don't forget to commit
mem.commit()?;
```

### Search

```rust
let response = mem.search(SearchRequest {
    query: "distributed systems".into(),
    top_k: 50,
    snippet_chars: 200,
    scope: Some("mv2://docs/".into()),  // optional: filter by URI prefix
    ..Default::default()
})?;

println!("Found {} results in {}ms", response.total_hits, response.elapsed_ms);
```

### Timeline

Browse documents chronologically:

```rust
use std::num::NonZeroU64;

let entries = mem.timeline(TimelineQuery {
    limit: NonZeroU64::new(100),
    since: Some(1706745600),  // Unix timestamp
    until: None,
    reverse: false,
    temporal: None,
})?;
```

### Stats and Verification

```rust
let stats = mem.stats()?;
println!("Frames: {}, Lex index: {}", stats.frame_count, stats.has_lex_index);

// Verify integrity
let report = Memvid::verify("data.mv2", true)?;  // true = deep check
```

## File Format

Everything lives in the `.mv2` file:

```
┌────────────────────────────┐
│ Header (4KB)               │  Magic, version, capacity
├────────────────────────────┤
│ Embedded WAL (1-64MB)      │  Crash recovery
├────────────────────────────┤
│ Data Segments              │  Compressed frames
├────────────────────────────┤
│ Lex Index                  │  Tantivy full-text
├────────────────────────────┤
│ Vec Index                  │  HNSW vectors
├────────────────────────────┤
│ Time Index                 │  Chronological ordering
├────────────────────────────┤
│ TOC (Footer)               │  Segment offsets
└────────────────────────────┘
```

No `.wal`, `.lock`, `.shm`, or any other files. Ever.

See [MV2_SPEC.md](MV2_SPEC.md) for the complete file format specification.

## Benchmarks

Run on Apple M1 Pro with 50K documents:

| Operation | Time |
|-----------|------|
| Search (single term) | 0.8ms |
| Search (multi-term) | 1.2ms |
| Cold start + first search | 190ms |
| Concurrent readers (8x) | 3.5ms total |

Run benchmarks yourself:

```bash
cd crates/memvid-core/benchmarks
cargo bench
```

## Examples

```bash
cargo run --example basic_usage
```

See [`examples/`](examples/) for more.

## Feature Compatibility

Files remember which features were enabled when created. Opening a file requires matching features:

```bash
# Check what a file needs
memvid stats file.mv2 --json | jq '.indexes'
```

If you created a file with the CLI (which enables everything), open it with all features:

```toml
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track"] }
```

## Logging

Uses `tracing`. Configure in your app:

```rust
tracing_subscriber::fmt()
    .with_env_filter("memvid_core=warn")
    .init();
```

## License

Apache 2.0