semantic-memory 0.3.0

Hybrid semantic search with SQLite, FTS5, and HNSW — built for AI agents
Documentation

semantic-memory

Hybrid semantic search library for Rust, backed by SQLite + FTS5 + HNSW. Built for AI agent memory systems.

Combines BM25 full-text search with approximate nearest neighbor vector search via Reciprocal Rank Fusion, giving you the best of both lexical and semantic retrieval in a single query.

Features

  • Hybrid search — BM25 (FTS5) + cosine similarity fused with RRF
  • HNSW indexing — Fast approximate nearest neighbor via hnsw_rs, with brute-force fallback
  • Knowledge store — Add, update, delete, and search facts organized by namespace
  • Document chunking — Ingest long documents with configurable overlap chunking and per-chunk embeddings
  • Conversation memory — Session-based message history with token budgeting and message search
  • SQ8 quantization — Quantized embeddings stored alongside f32 for space-efficient persistence
  • Recency boosting — Optional time-decay weighting with configurable half-life
  • Single-file storage — Everything lives in one SQLite database + optional HNSW sidecar files
  • Async API — All public methods are async, with SQLite I/O on spawn_blocking
  • Zero external services — Only requires Ollama for embeddings (or use MockEmbedder for testing)

Quick Start

Prerequisites

  • Rust 1.75+
  • Ollama running locally with an embedding model:
ollama pull nomic-embed-text

Installation

[dependencies]
semantic-memory = "0.3"

Usage

use semantic_memory::{MemoryConfig, MemoryStore};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let store = MemoryStore::open(MemoryConfig::default())?;

    // Store facts
    store.add_fact("general", "Rust was first released in 2015", None, None).await?;
    store.add_fact("general", "Python is great for data science", None, None).await?;

    // Hybrid search (BM25 + vector similarity)
    let results = store.search("systems programming language", Some(5), None, None).await?;
    for r in &results {
        println!("[{:.4}] {}", r.score, r.content);
    }

    // FTS-only search (no embedding model needed)
    let results = store.search_fts_only("Rust", Some(5), None, None).await?;

    Ok(())
}

Conversation Memory

use semantic_memory::{MemoryConfig, MemoryStore, MockEmbedder, Role};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let store = MemoryStore::open_with_embedder(
        MemoryConfig::default(),
        Box::new(MockEmbedder::new(768)),
    )?;

    let session = store.create_session("repl").await?;

    store.add_message(&session, Role::User, "What is Rust?", Some(10), None).await?;
    store.add_message(&session, Role::Assistant, "A systems language.", Some(8), None).await?;

    // Get messages within a token budget
    let messages = store.get_messages_within_budget(&session, 500).await?;

    // Search across all conversations
    let results = store.search_messages("systems language", &session, Some(5)).await?;

    Ok(())
}

Document Ingestion

let doc_id = store.add_document(
    "Rust Book Ch.1",
    &long_text,
    "docs",
    None, // source path
    None, // metadata
).await?;

// Chunks are automatically created, embedded, and searchable
let results = store.search("ownership and borrowing", Some(5), None, None).await?;

Architecture

MemoryStore
├── SQLite (FTS5)        — BM25 full-text search, f32 + SQ8 embeddings, all metadata
├── HNSW Index           — Approximate nearest neighbor (optional, feature-gated)
└── Ollama / MockEmbedder — Embedding generation

Search pipeline:

  1. FTS5 MATCH produces BM25-ranked candidates
  2. HNSW ANN (or brute-force) produces vector-similarity candidates
  3. Reciprocal Rank Fusion merges both lists into a single scored ranking

Storage layout:

base_dir/
├── memory.db            — SQLite database (content, metadata, FTS5, embeddings)
├── memory.hnsw.graph    — HNSW graph topology (optional)
└── memory.hnsw.data     — HNSW vector data (optional)

SQLite is the single source of truth. The HNSW index is a performance accelerator that can be rebuilt from SQLite at any time.

Configuration

All configuration is done through MemoryConfig:

use semantic_memory::{MemoryConfig, SearchConfig, EmbeddingConfig, ChunkingConfig};
use std::path::PathBuf;

let config = MemoryConfig {
    base_dir: PathBuf::from("/data/agent-memory"),
    embedding: EmbeddingConfig {
        ollama_url: "http://localhost:11434".into(),
        model: "nomic-embed-text".into(),
        dimensions: 768,
        batch_size: 32,
        timeout_secs: 30,
    },
    search: SearchConfig {
        bm25_weight: 1.0,
        vector_weight: 1.0,
        rrf_k: 60.0,
        default_top_k: 5,
        min_similarity: 0.3,
        recency_half_life_days: Some(30.0), // Enable recency boosting
        ..Default::default()
    },
    chunking: ChunkingConfig {
        target_size: 1000,
        min_size: 100,
        max_size: 2000,
        overlap: 200,
    },
    ..Default::default()
};

Feature Flags

Flag Default Description
hnsw Yes HNSW approximate nearest neighbor search via hnsw_rs
brute-force No Exact cosine similarity search (no external index)
testing No Enables test utilities and MockEmbedder helpers

At least one of hnsw or brute-force must be enabled.

# Default (HNSW enabled)
cargo build

# Brute-force only (no HNSW dependency)
cargo build --no-default-features --features brute-force

# Run tests
cargo test --features "hnsw,testing"

API Overview

Knowledge Store

Method Description
add_fact(namespace, content, source, metadata) Store a searchable fact
update_fact(id, content, metadata) Update fact content and re-embed
delete_fact(id) Remove a fact
get_fact(id) Retrieve a fact by ID
list_facts(namespace, limit, offset) List facts with pagination

Document Store

Method Description
add_document(title, content, namespace, source, meta) Ingest and chunk a document
delete_document(id) Remove document and all its chunks
list_documents(namespace, limit, offset) List documents with pagination

Search

Method Description
search(query, top_k, namespace, domain) Hybrid BM25 + vector search
search_fts_only(query, top_k, namespace, domain) BM25-only search (no embeddings)
search_vector_only(query, top_k, namespace, domain) Vector-only search
search_messages(query, session_id, top_k) Search conversation history

Conversations

Method Description
create_session(channel) Start a new conversation session
add_message(session, role, content, tokens, meta) Append a message
get_recent_messages(session, limit) Get latest messages
get_messages_within_budget(session, budget) Get messages fitting a token budget
session_token_count(session) Total tokens in a session
list_sessions(limit, offset) List all sessions
delete_session(id) Remove a session and its messages

Maintenance

Method Description
stats() Database statistics
rebuild_hnsw_index() Rebuild HNSW from SQLite (hot-swap)
compact_hnsw() Clean up HNSW tombstones

License

MIT