argyph-store 1.0.3

Local-first MCP server giving AI coding agents fast, structured, and semantic context over any codebase.
Documentation
# `argyph-store` — persistence

## Purpose

The persistence layer. Owns SQLite (for files, symbols, edges, FTS5 text index, and chunk vectors). Provides the `Store` trait that the rest of the project consumes.

## Owns

- SQLite schema and migrations for the `files`, `symbols`, `chunks`, `edges`, `vectors` tables and the FTS5 virtual table over chunk text.
- The `Store` trait and its default implementation.
- Hybrid search query: BM25 from SQLite FTS5 fused with vector similarity via reciprocal rank fusion (RRF).
- Schema migration runner — runs at boot, before any read/write.
- Integrity check on startup; on detected corruption, drop and rebuild the index with a clear user warning.
- WAL mode configuration on SQLite.
- The on-disk layout under `.argyph/`:
  ```
  .argyph/
    meta.sqlite        # SQLite (files, symbols, chunks-text, edges, FTS5, vectors)
    meta.sqlite-wal
    schema_version
  ```

## Must never own

- Embedding generation (lives in `argyph-embed`).
- Parsing or chunking (lives in `argyph-parse`).
- MCP or CLI surfaces.
- Any business logic about *when* to read or write.

## Public surface

```rust
#[async_trait]
pub trait Store: Send + Sync {
    async fn upsert_files(&self, files: &[FileEntry]) -> Result<()>;
    async fn upsert_symbols(&self, symbols: &[Symbol]) -> Result<()>;
    async fn upsert_chunks(&self, chunks: &[Chunk]) -> Result<()>;
    async fn upsert_vectors(&self, vectors: &[(ChunkId, Vec<f32>)]) -> Result<()>;
    async fn upsert_edges(&self, edges: &[Edge]) -> Result<()>;

    async fn delete_file(&self, path: &Utf8Path) -> Result<()>;
    async fn list_files(&self) -> Result<Vec<FileEntry>>;
    async fn file_meta(&self, path: &Utf8Path) -> Result<Option<FileEntry>>;

    async fn find_symbol(&self, name: &str, scope: Option<&str>) -> Result<Vec<Symbol>>;
    async fn find_references(&self, sel: SymbolSelector) -> Result<Vec<Reference>>;
    async fn neighbors(&self, sel: SymbolSelector, kind: EdgeKind) -> Result<Vec<Edge>>;

    async fn search_text(&self, q: &TextQuery) -> Result<Vec<TextHit>>;
    async fn search_hybrid(
        &self,
        query: &str,
        query_vec: &[f32],
        k: usize,
        filter: Filter,
    ) -> Result<Vec<SearchHit>>;

    async fn missing_vectors(&self, model: &str) -> Result<Vec<ChunkId>>;
}

pub struct DefaultStore { /* private */ }

impl DefaultStore {
    pub async fn open(root: &Path) -> Result<Self>;
    pub async fn close(self) -> Result<()>;
}
```

## Internal structure

- `src/lib.rs``Store` trait, `DefaultStore` factory.
- `src/sqlite/mod.rs` — SQLite connection pool and `Store` trait implementation.
- `src/meta.rs` — SQLite connection pool, query helpers.
- `src/schema.rs`**architecture-protected.** Schema definitions, never edited; new versions add migrations.
- `src/migrations/`**architecture-protected.** Numbered SQL files: `001_initial.sql`, `002_add_symbols.sql`, etc.
- `src/hybrid.rs` — reciprocal rank fusion of BM25 and vector results.
- `src/error.rs` — typed errors.

## Failure modes

- **AI agents editing existing migrations.** Hard rule: never. Add a new migration file; never edit an old one. AI Agent Rule #2.
- **AI agents bypassing the `Store` trait.** Other crates must never `use rusqlite` directly. The trait is the seam.
- **AI agents writing schema changes without bumping `schema_version`.** Tested in CI: an integration test that fails if the migration set's resulting schema diverges from `schema.rs`.
- **Power-loss scenarios.** WAL mode + transactional writes mostly handle this. The integrity check on startup is the backstop.
- **AI agents disabling WAL mode for "simplicity."** Don't.

## Honest limitations

- SQLite is excellent up to ~10M rows for our query patterns. Past that, query plan tuning matters; we'll address it if/when we hit it.
- Vector search uses brute-force cosine similarity in Rust over SQLite BLOBs, keeping the build portable with zero native dependencies beyond SQLite.
- We do not currently support concurrent readers from a different process. The `argyph` binary is the single owner.
- The embedded model fingerprint is checked on startup; if the configured model doesn't match the index's embeddings, the user is told to reindex Tier 2.

## Stability

- `Store` trait is *the* most central inter-crate contract. Adding methods is fine; changing existing method signatures is a coordinated change across multiple crates.
- Schema migrations are append-only.
- The on-disk layout (`.argyph/` contents) is part of the user-visible contract; changes require a migration plan.