# Architecture
## Overview
This repository is a Rust **library**, **CLI** (`rag`), and **MCP server** (`rag-mcp-server`) for Retrieval-Augmented Generation. Retrieval is deliberately **multi-signal**:
- **Vectors** — semantic similarity over embedded chunks.
- **Graph** — entities and co-occurrence (and similar) edges for expansion and structure.
- **Search-style operations** — top-k limits, scored ranking, listing, counting, and metadata filters.
The vector-centric entry point is `Retriever`. The combined vector + graph path is `GraphRagEngine` (library) and a parallel set of MCP tools under the `graph_*` namespace.
## Core components
### 1. Embeddings (`src/embeddings.rs`)
**Purpose:** Turn text into dense vectors.
**Trait:**
- `EmbeddingModel`: `embed`, `embed_single`.
**Implementations:**
- `OpenAIEmbeddingModel`
- `OllamaEmbeddingModel`
- **`HttpEmbeddingModel`** — OpenAI-compatible embedding HTTP API.
### 2. Vector store (`src/vector_store.rs`)
**Purpose:** Persist chunks and run similarity search.
**Trait `VectorStore`:** `add`, `add_batch`, `search`, `search_with_filter`, `search_batch`, `get`, `delete`, `list`, `count`, `metric`.
**Types:** `Document`, `Similarity`, `MetadataFilter`.
**Implementations:**
- `InMemoryVectorStore` — `DashMap` + pluggable `Index`; optional **JSON file** `save_to_file` / `load_from_file`.
- `MinimalVectorDB` — `RwLock` map + `FlatIndex`.
- **`JsonPersistentVectorStore`** — wraps `InMemoryVectorStore`, flushes to a path on each mutation.
### 3. Index (`src/index.rs`)
**Purpose:** Pluggable nearest-neighbor logic and distance metrics.
**Trait `Index`:** `add`, `remove`, `search`, `search_batch`, `clear`, `len`, `metric`.
**Metrics:** `Cosine`, `Euclidean`, `DotProduct`, `Manhattan`.
**Implementation:** `FlatIndex` — exact, parallel batch queries; `IvfflatIndex` — centroid buckets + probing (approximate).
### 4. Chunker (`src/chunker.rs`)
**Purpose:** Split documents before embedding.
**Trait:** `TextChunker::chunk`.
**Implementations:** `FixedSizeChunker`, `ParagraphChunker`, `SentenceChunker`.
### 5. Retriever (`src/retriever.rs`)
**Purpose:** Orchestrate chunking, embedding, and vector search for classic RAG.
**Methods:** `add_document`, `add_document_with_metadata`, `retrieve`, `retrieve_with_scores`, `retrieve_filtered`, **`retrieve_hybrid`**, **`retrieve_hybrid_dedup`**.
### 6. Graph (`src/graph.rs`)
**Purpose:** In-memory property graph for RAG augmentation.
**Types:** `GraphNode`, `GraphEdge`, `GraphPath`, `Community`.
**`GraphStore`:** add/remove nodes and edges, lookup by id or name, BFS-style reachability, neighbors, degree, density, community detection (label propagation), `save_to_file` / `load_from_file` / `from_persisted` (`GraphPersisted`).
### 7. Graph RAG (`src/graph_rag.rs`)
**Purpose:** Single engine that writes to both `VectorStore` and `GraphStore`.
**Pieces:**
- `EntityExtractor` / `SimpleEntityExtractor` — heuristic entities (quoted strings, acronyms, proper nouns).
- `GraphRagEngine` — `add_document`, **`save_snapshot`**, configurable **`co_occurrence_relation`**; `query` merges vector top-k with graph expansion; **`load_from_snapshot_file`** (with `SimpleEntityExtractor` + `InMemoryVectorStore`).
**Types:** `GraphRagResult`, `EntityInfo`, `GraphInfo` (and related) for structured results.
### 8. Ingestion (`src/ingestion.rs`)
**Purpose:** Pull text from external sources into `ExtractedDocument` records for downstream chunking and embedding.
**Trait:** `Source::extract`.
**Sources:** `PdfSource`, `CodebaseSource`, `WikiSource`, and related helpers.
### 9. MCP (`src/mcp.rs`, binary `src/mcp_server.rs`)
**Purpose:** Expose RAG operations over Model Context Protocol (stdio transport, `rmcp`).
**Handler:** `RagMcpServer` — shared `InMemoryVectorStore`, `GraphStore`, embedding backend (`OpenAI` or `Ollama`), `SimpleEntityExtractor`, and side maps from entity names to chunk ids and reverse.
**Vector tools:**
- `rag_add_document` — chunk, embed, index.
- `rag_query` — semantic search with scores.
- `rag_list_documents` — paginated listing.
- `rag_count` — chunk count.
**Graph and hybrid tools:**
- `graph_build` — like add, but also extracts entities, links co-occurrence, stores entity metadata on chunks, updates graph.
- `graph_query` — vector search plus graph expansion from query entities; labels hits as `vector` vs `graph`.
- `graph_get_entity`, `graph_get_neighbors` — introspection.
- `graph_info`, `graph_communities` — stats and community structure.
### 10. Supporting retrieval, ANN, and persistence
| `src/keyword.rs` | `Bm25Index`, `tokenize`. |
| `src/hybrid.rs` | `merge_hybrid` — fuse vector + BM25 scores. |
| `src/dedup.rs` | Word Jaccard near-duplicate filtering. |
| `src/rerank.rs` | `SimilarityReranker` trait and `PassthroughReranker`. |
| `src/index_ivf.rs` | `IvfflatIndex` — IVF-style approximate search implementing `Index`. |
| `JsonPersistentVectorStore` | `VectorStore` that rewrites `vectors.json` after mutations. |
| `GraphPersisted` | Serializable graph (`nodes` + `edges`). |
| `GraphRagSnapshot` | Documents + graph + entity/chunk side maps + engine hyperparameters. |
| `HttpEmbeddingModel` | Configurable OpenAI-compatible `/embeddings` client. |
### 11. Errors (`src/errors.rs`)
**Purpose:** `RagError` and `Result<T>` alias for unified error handling.
## Architecture diagram
```
┌──────────────────────────────────────┐
│ CLI (`rag`) / Library / MCP │
└────────────────────┬─────────────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Retriever │ │ GraphRagEngine│ │ RagMcpServer │
│ (vector RAG)│ │ (vector+graph)│ │ (both tools) │
└──────┬──────┘ └───────┬───────┘ └──────┬──────┘
│ │ │
│ ┌─────────────┴─────────────┐ │
│ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ │
│ │ GraphStore │ │EntitySide │◄──────────┤
│ │ nodes/edges│ │ maps │ │
│ └────────────┘ └────────────┘ │
│ │
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Embeddings │ │ Chunker │ │ Chunker │
│ OpenAI/ │ │ (strategies) │ │ (MCP paths) │
│ Ollama │ └──────────────┘ └──────────────┘
└──────┬──────┘ │ │
│ ▼ ▼
│ ┌──────────────┐ ┌──────────────┐
└──────────────────────►│ VectorStore │◄────────────┤ InMemory* │
│ + Index │ │ + FlatIndex │
└──────────────┘ └──────────────┘
Ingestion: PdfSource / CodebaseSource / WikiSource ──► ExtractedDocument ──► Retriever or GraphRagEngine
```
## Data flow
### Add document (Retriever)
1. Chunk text.
2. Embed chunks.
3. Store in `VectorStore` (index updated).
### Add document with graph (`GraphRagEngine` or MCP `graph_build`)
1. Chunk and embed as above.
2. Extract entities per chunk; ensure nodes in `GraphStore`.
3. Add co-occurrence edges between entities in the same chunk (with weights where supported).
4. Record entity-to-chunk and chunk-to-entity mappings.
5. Persist chunks in `VectorStore`.
### Hybrid query (`GraphRagEngine::query` or MCP `graph_query`)
1. Embed query; run vector `search` for top-k.
2. Extract query entities; traverse graph to a configured depth.
3. Collect chunk ids linked to matched entities; merge with vector results (deduplicate, cap at top-k).
4. Return ranked list with scores where available and provenance (`vector` vs `graph` in MCP JSON).
### Hybrid BM25 + vector (`Retriever::retrieve_hybrid`)
1. List all chunks from `VectorStore`; build `Bm25Index`.
2. Run vector `search` and BM25 `search` with an enlarged top-k.
3. `merge_hybrid` normalizes score channels and fuses with `alpha`.
### Snapshot (`GraphRagEngine::save_snapshot`)
1. Serialize all documents (including embeddings), `GraphPersisted`, and entity/chunk maps.
2. `load_from_snapshot_file` rebuilds `InMemoryVectorStore` and in-memory `DashMap` side tables.
## Extensibility
### New embedding models
Implement `EmbeddingModel` with `async_trait` and delegate to your provider.
### New indexes
Implement `Index` (for example HNSW) and use it inside a custom `VectorStore` or swap the index inside `InMemoryVectorStore` if the API allows.
### New vector stores
Implement `VectorStore` for disk or remote backends.
### New chunkers
Implement `TextChunker`:
```rust
impl TextChunker for MyChunker {
fn chunk(&self, text: &str) -> Result<Vec<String>> {
// ...
}
}
```
### Richer graphs
Implement `EntityExtractor` to feed `GraphRagEngine` with NER or model-based entities.
## Concurrency
- Public APIs are async (`tokio`).
- `InMemoryVectorStore` and `GraphStore` use `DashMap` for concurrent access.
- Batch index search can run work in parallel depending on `FlatIndex` usage.
## Performance
- `FlatIndex` is O(n) per query; `IvfflatIndex` probes a subset of centroid buckets first (exact within probed buckets). For very large n consider external HNSW (see [TODO.md](TODO.md)).
- Prefer batch embedding where the model allows.
- Normalize vectors when using dot-product metric for stability.
## Security
- Read API keys from environment variables; avoid printing them.
- Treat ingested paths and URLs as untrusted; validate and sandbox as needed in calling code.
## Persistence and format notes
- `Document` JSON includes `embedding` when present (`#[serde(default)]`). Older files without this field still deserialize with `embedding: None`.
- **`GraphRagSnapshot` format_version** is `1` (bump and document when breaking on-disk layout).
## Related docs
- [SPEC.md](SPEC.md) — scope and requirements.
- [TODO.md](TODO.md) — planned enhancements.