VecLayer
Long-term memory for AI agents. Hierarchical, perspectival, aging knowledge.
Status: 0.1.0 — MCP tool & CLI for local use. APIs are evolving. Author: Florian Burka, developed in dialogue with Claude
What is VecLayer?
VecLayer organizes knowledge as a hierarchy: summaries over summaries, at arbitrary depth, from different perspectives on the same raw data. A search starts with the overview and drills down on demand — like human remembering.
Instead of flat chunk lists or key-value stores, VecLayer provides structured, aging, self-describing memory. From the statistical shape of all memories — embedding clusters weighted by salience — an identity emerges organically.
The Core Thesis
Summaries are not a feature alongside others — they are the memory itself. The hierarchy that makes RAG better (overview before detail, navigation instead of flat lists) is the same structure from which identity emerges. And personality is not shaped by what you do often, but by what moved you. That is why salience measures significance, not frequency.
What You Get
- Semantic search with hierarchy — query your documents and get results organized by document → section → paragraph, not flat chunk lists
- Persistent AI memory — an MCP server that gives coding assistants (Claude, etc.) long-term memory across sessions
- Automatic aging — important knowledge stays present, unused knowledge naturally fades
- Identity from memory — on connect, an agent receives a priming of who it is: core knowledge, open threads, recent learnings
Core Concepts
- One primitive: Entry — Everything is an Entry. Four types:
raw,summary,meta,impression. ID =sha256(content), first 7 hex chars in CLI (like git). Identical content = identical ID = idempotent. - Seven perspectives —
intentions,people,temporal,knowledge,decisions,learnings,session. Each perspective has hints for LLMs. Extensible with custom perspectives. - Memory aging — RRD-inspired access tracking with fixed time windows. Important stays present, unused fades. Configurable degradation rules.
- Salience — Measures significance, not frequency. Composite of interaction density (0.5), perspective spread (0.25), and revision activity (0.25). High-salience entries survive aging.
- Identity — Emerges from salience-weighted embedding centroids per perspective. On connect, the agent receives a priming: core knowledge, open threads, recent learnings. The moment an agent wakes up and knows itself.
- Sleep cycle — Optional LLM-powered consolidation: reflect → think → add → compact. Most think actions are mechanical; only reflection and consolidation require an LLM.
For technical details see ARCHITECTURE.md.
Quick Start
# Initialize a new VecLayer store
# Store knowledge
# Recall
# Drill down
# Start server (MCP/HTTP)
Configuration
VecLayer resolves configuration in this order (highest priority first): CLI flags → environment variables → user config path overrides → user config globals → project-local config → git auto-detection → defaults.
User config (~/.config/veclayer/config.toml)
# Global defaults
= "~/.local/share/veclayer"
[]
= "ollama"
= "llama3.2"
= "http://localhost:11434"
# api_key = "sk-..." # for OpenAI-compatible providers (use HTTPS!)
# Project-specific overrides — matched by path glob and/or git remote
[[]]
= "~/work/myproject"
= "~/work/myproject/.veclayer-data"
[[]]
= "github\\.com/myorg/.*"
= "~/.local/share/veclayer/myorg"
Project-local config (.veclayer/config.toml in project root)
[]
= "codellama"
Embedding Models
VecLayer supports local and external embedding backends.
Built-in models (fastembed, default)
| Model | Dimension | Config value |
|---|---|---|
| BAAI/bge-small-en-v1.5 (default) | 384 | Xenova/bge-small-en-v1.5 |
| BAAI/bge-base-en-v1.5 | 768 | Xenova/bge-base-en-v1.5 |
| BAAI/bge-large-en-v1.5 | 1024 | Xenova/bge-large-en-v1.5 |
| all-MiniLM-L6-v2 | 384 | Xenova/all-MiniLM-L6-v2 |
Models download automatically on first use.
GPU embedding with TEI or Ollama
Use an external HTTP server for GPU-accelerated embeddings. VecLayer tries Ollama format first (/api/embed), then falls back to OpenAI-compatible (/v1/embeddings).
Config (.veclayer/config.toml or ~/.config/veclayer/config.toml):
[]
= "ollama"
= "nomic-embed-text" # or any model your server supports
= "http://localhost:11434"
= 768 # must match the model's output dimension
Environment variables (override config):
VECLAYER_EMBEDDER=ollama
VECLAYER_OLLAMA_MODEL=nomic-embed-text
VECLAYER_OLLAMA_URL=http://localhost:11434
VECLAYER_OLLAMA_DIMENSION=768
TEI example (Hugging Face Text Embeddings Inference):
# then in config.toml:
# [embedder]
# type = "ollama"
# base_url = "http://localhost:8080"
# model = "BAAI/bge-small-en-v1.5"
# dimension = 384
MCP Server Setup
VecLayer provides an MCP server for integration with Claude Code and Opencode.
Installation
Ensure veclayer is installed and available in your PATH:
First run downloads the embedding model (~130MB). See First Run for details.
Claude Code Setup
Single project (store in project directory):
Multi-project setup with shared data directory:
# Add for each project with project-scoped memory (data directory is auto-created)
Opencode Setup
Opencode uses a similar MCP configuration format. Check the Opencode documentation for the current config path and schema.
Example configurations are available in .claude/settings.json.example (single-project) and .claude/settings.json.example.multi-project (multi-project).
Single project:
Multi-project (replace /home/you with your actual home directory — tilde ~ is not expanded in JSON):
Multi-Project Setup
Use a single shared data directory with per-project MCP instances for isolation.
Mental Model
- One shared data directory (
~/.veclayer/data) - Each project gets its own MCP instance with
--project <name> - Project entries stay scoped to that project
- Personal entries (with
scope: "personal") are visible across all projects - Identity priming is computed from project-scoped + personal entries
Example Configuration
# Project A: frontend (data directory is auto-created with 0700 permissions)
# Project B: backend
Cross-Project Knowledge
Store knowledge that follows you across projects with scope: "personal":
Project-scoped knowledge:
CLI Overview
| Command | Description |
|---|---|
init |
Initialize a new VecLayer store |
store |
Store knowledge (text, file, directory) |
recall |
Semantic search with perspective filter |
focus |
Drill into an entry, show children |
reflect |
Identity snapshot, salience ranking, archive candidates |
think |
Curate: promote, demote, relate, discover, aging, LLM consolidation |
serve |
Start MCP/HTTP server |
status |
Store statistics |
perspective |
Manage perspectives (list, add, remove) |
history |
Show version/relation history of an entry |
archive |
Demote entries to deep_only visibility |
export |
Export entries to JSONL |
import |
Import entries from JSONL |
Aliases: add = store, search/s = recall, f = focus, id = reflect
Building from Source
Prerequisites
- Rust toolchain (stable, edition 2021+)
- protoc (Protocol Buffers compiler) — required by LanceDB
- Debian/Ubuntu:
apt-get install protobuf-compiler - macOS:
brew install protobuf
- Debian/Ubuntu:
- Internet access during first build —
ort-sysdownloads ONNX Runtime (~19 MB)
Build
First Run
On first use, VecLayer downloads the embedding model (BAAI/bge-small-en-v1.5, ~130 MB via HuggingFace) to a local cache (.fastembed_cache/ relative to the working directory). This requires internet access.
Troubleshooting
Failed to initialize FastEmbed: Failed to retrieve onnx/model.onnx
The embedding model couldn't be downloaded. Common causes:
- No internet access or corporate TLS proxy intercepting HTTPS
- Fix: manually download the model files from
Xenova/bge-small-en-v1.5on HuggingFace and place them in.fastembed_cache/models--Xenova--bge-small-en-v1.5/snapshots/<commit_hash>/
Could not find protoc
Install the Protocol Buffers compiler (see prerequisites above).
Failed to connect to Ollama
The think cycle and cluster summarization require a running Ollama instance. These features are optional — store, recall, focus, and all non-LLM commands work without it.
Feature Flags
| Feature | Default | What it enables |
|---|---|---|
llm |
Yes | LLM-powered summarization, clustering, think/consolidation cycle |
sync |
No | Cross-store synchronization (experimental) |
Build without LLM dependencies:
All core functionality (store, recall, focus, perspectives, aging, identity) works without the llm feature. Only summarization and the think command require it.
Tech Stack
| Component | Technology |
|---|---|
| Language | Rust |
| Storage | LanceDB (local cache & indices) |
| Embeddings | fastembed (CPU, ONNX) — trait-based, swappable |
| Parsing | pulldown-cmark (Markdown), extensible |
| Server | axum (MCP + HTTP) |
| CLI | clap v4 |
| Config | TOML + ENV overrides (12-Factor) |
Status
Phases 1–5.5 complete: core model, perspectives, aging/salience, identity, think cycle, tool ergonomics. Next up: Phase 6 (UCAN sharing). See Issues for the full roadmap.
Known Limitations
The following are known issues tracked for future releases:
- HTTP server has no authentication — The REST API (
veclayer serve --http) binds to localhost with restricted CORS but has no auth tokens. Do not expose to untrusted networks. Auth is planned for a future release. - API keys stored as plain strings — LLM API keys (for OpenAI-compatible providers) are held in memory as
Stringwithout zeroing on drop. Acceptable for CLI use; not suitable for long-running shared server deployments without additional safeguards. - Test env var manipulation — Some tests use
std::env::set_varwhich becameunsafein Rust 1.83+. These tests useserial_testfor isolation but will needunsafeblocks in a future Rust edition. - chunk.rs scope — The core
chunkmodule (1000+ lines) is planned for decomposition before v0.2:ChunkRelationand relation constants will move to therelationsmodule.
Design Decisions: What VecLayer Does NOT Do
Explicitly rejected approaches — documented and reasoned, not forgotten.
| Rejected | Instead | Why |
|---|---|---|
| JSON annotations on entries | Content carries the semantics | No schema drift from optional fields |
| Paths as sole structure | Perspectives | Same entry, different views |
| Tags | Perspectives with hints | Tags are flat and unexplained |
| Separate vector spaces for emotions | Salience as composite score | One space, different weightings |
| S3 backends | Local files + Turso/pgvector | Simplicity, latency, offline capability |
| ACLs | UCAN | Decentralized, delegatable, offline-verifiable |
| Bearer tokens | UCAN with DID | Cryptographic, attenuatable |
| Static tool descriptions | Dynamic priming | Personalized per agent and session |
| Leaf/node separation | Everything is an Entry | One primitive, four types |
| "Trees" as concept | Perspectives | Trees are rigid, perspectives are views |
| Graph database | Relations on entries | The graph reveals itself in visualization |
| Metadata fields for emotions | Perspectives + content | The perspective is the semantics |
| Tool call hooks for auto-capture | Behavioral hints in priming | Intelligence stays with the agent |
License
MIT