argyph 1.0.1

Local-first MCP server giving AI coding agents fast, structured, and semantic context over any codebase.
# Argyph

> The local-first MCP server for serious codebases.
> Zero config, zero cloud, full context.

[![CI](https://github.com/Ezzy1630/argyph/actions/workflows/ci.yml/badge.svg)](https://github.com/Ezzy1630/argyph/actions/workflows/ci.yml)
[![crates.io](https://img.shields.io/crates/v/argyph.svg)](https://crates.io/crates/argyph)
[![npm](https://img.shields.io/npm/v/argyph.svg)](https://www.npmjs.com/package/argyph)
[![License: MIT OR Apache-2.0](https://img.shields.io/badge/license-MIT%20OR%20Apache--2.0-blue.svg)](#license)

Argyph is a single MCP server that gives any AI coding agent fast, structured, and semantic context over a codebase. It runs entirely on your machine, indexes incrementally, and is ready in under a second on previously-indexed repos.

The name is a portmanteau of **Argus** (the hundred-eyed watcher of Greek myth) and **Glyph** (a carved symbol with bound meaning) — a server that watches a codebase and gives the agent its symbols.

---

## What it does

Argyph replaces the half-dozen MCP servers most developers stitch together (one for grep, one for embeddings, one for symbol search, one for repo packing) with a single tool. It exposes three pillars of context behind one MCP endpoint:

0. **Ask-first retrieval**`ask` is the primary agent-facing lookup tool. It routes bare identifiers to symbol search, structured locators to `locate`, and natural-language questions to hybrid search, returning bounded `Span` results instead of whole files.
1. **File and symbol intelligence** — a tree-sitter-driven symbol graph with `find_definition`, `find_references`, callers, callees, imports, and outline tools. Structural queries return in milliseconds.
2. **Semantic search** — hybrid (BM25 + vector) search over AST-aware chunks, backed by an embedded LanceDB store. Bundled local embedding model means no API key is required for full functionality.
3. **Repo packing** — token-budgeted, repomix-style flattening of a repo or subset for agents that need to absorb a codebase quickly.

Everything is read-only. Argyph never edits, commits, or executes code.

---

## Why local-first

Most existing context servers depend on a cloud vector database (Milvus, Pinecone) and a remote embedding API. That's a non-starter for proprietary code at most companies, and a tax on cold starts everywhere else. Argyph runs entirely on the developer's machine: a single binary, an embedded vector store, an optional bundled embedding model, no daemon, no account, no key required to get full functionality.

---

## Install

> The first tagged release is `v1.0.0-rc.1`. If you don't see a published
> package on a given channel yet, use the **Build from source** path below —
> it works on any platform with a current Rust toolchain.

### Claude Code

```bash
claude mcp add argyph -- npx argyph@latest
```

### npm / npx

```bash
npx argyph
```

### Homebrew (macOS / Linux)

```bash
brew install Ezzy1630/argyph/argyph
```

### Universal installer

```bash
curl -fsSL https://raw.githubusercontent.com/Ezzy1630/argyph/main/scripts/install.sh | bash
```

> **Intel Mac (`x86_64-apple-darwin`) users:** the ONNX Runtime backend
> doesn't ship a prebuilt for Intel macOS, so no prebuilt binary is
> produced for that target. Use the **Cargo** path below — it builds
> in a couple of minutes and works fine on Intel Macs.

### Cargo

`cargo install argyph` builds Argyph from source. The bundled local
embedder uses [`ort`](https://crates.io/crates/ort) (ONNX Runtime); on
most platforms `ort` ships a prebuilt dynamic library and there is
nothing to do. On Linux you may need `libssl-dev` and a working C
toolchain; on Windows the MSVC build tools are required. If the build
fails on `ort-sys` linkage, set `ORT_STRATEGY=download` before
re-running.

```bash
cargo install argyph --locked
```

### Build from source

```bash
git clone https://github.com/Ezzy1630/argyph.git
cd argyph
cargo build --release
./target/release/argyph serve
```

### Claude Desktop (DXT)

Download `argyph.dxt` from the [latest release](https://github.com/Ezzy1630/argyph/releases/latest) and double-click.

---

## Quick start

```bash
# In any repo
cd ~/code/your-repo
argyph init
claude mcp add argyph -- npx argyph@latest
claude
```

In the chat:

> What does this codebase do, and where is session expiration controlled?

Argyph indexes Tier 0 in under a second on first run, Tier 1 (symbol graph) in seconds, and Tier 2 (embeddings) in the background. You can query immediately — tools return what's available now plus an `index_coverage` field so the agent knows.

---

## How it works

![Argyph three-tier indexing](docs/assets/three-tier-indexing.svg)

Argyph builds the index in three tiers, each useful before the next completes:

| Tier | What it builds                                | Time on a 1M-LOC repo | Useful for                                  |
|------|-----------------------------------------------|------------------------|---------------------------------------------|
| 0    | File inventory, hashes, .gitignore-aware tree | <1 s                   | Tree views, ripgrep, packing                |
| 1    | Symbol graph (defs, refs, calls, imports)     | ~30 s                  | Go-to-def, find-references, call graphs     |
| 1.5  | Structural index (non-code: md, json, yaml…) | seconds (after Tier 1) | `locate` for structured non-code files       |
| 2    | Embeddings + hybrid index                     | minutes (background)   | Fuzzy semantic queries                      |

The 80/20 insight: most agent queries are structural (`where is parseConfig defined?`, `what calls validateUser?`) and don't need embeddings. Argyph serves those from Tier 1 in milliseconds, even while Tier 2 is still building.

After the first index, the on-disk `.argyph/` directory persists and only changed files are reprocessed on subsequent runs. A filesystem watcher keeps everything live.

---

## Tools

| Tool                  | Description                                              | Tier required |
|-----------------------|----------------------------------------------------------|---------------|
| `ask`                 | Primary lookup router returning bounded spans            | 0/1/1.5/2     |
| `get_index_status`    | Tier readiness, embedding progress, watcher state        | 0             |
| `get_repo_overview`   | Languages, entry points, README excerpt, tree            | 0             |
| `search_text`         | Ripgrep-style regex / literal search                     | 0             |
| `find_definition`     | Locate the definition of a named symbol                  | 1             |
| `find_references`     | Reference sites with surrounding context                 | 1             |
| `get_callers`         | Functions that call a given function                     | 1             |
| `get_callees`         | Functions a given function calls                         | 1             |
| `get_imports`         | Imports of a file, and files that import it              | 1             |
| `get_symbol_outline`  | Hierarchical outline of a file                           | 1             |
| `search_semantic`     | Hybrid BM25 + vector over AST-aware chunks               | 2             |
| `pack_repo`           | Token-budgeted repo flattening (XML or markdown)         | 0+1           |
| `locate`              | Smallest natural span containing the target              | 1.5           |
| `locate_smart`        | Retrieval subagent (opt-in; needs provider config)     | 1.5           |
| `expand_span`         | Expand one truncated span from a session handle          | 0             |
| `memory_save`         | Persist a memory entry under a scope                     | 0             |
| `memory_search`       | FTS5 search over persistent memories                     | 0             |
| `memory_list`         | List memories in a scope                                 | 0             |
| `memory_forget`       | Delete a memory entry by id                              | 0             |

Full schema reference: [docs/tools-reference.md](docs/tools-reference.md).

---

## Optional features

### `locate_smart` — retrieval subagent

`locate_smart` is an opt-in tool that runs a bounded multi-step retrieval loop using an LLM provider. It's **disabled by default**; enable it in `.argyph/config.toml`:

```toml
[locate_smart]
enabled  = true
provider = "openai"        # or "anthropic" | "ollama"
model    = "gpt-4o-mini"
# endpoint = "http://localhost:11434"   # only for local providers
```

Or via env:

```bash
ARGYPH_LOCATE_SMART_ENABLED=1
ARGYPH_LOCATE_SMART_PROVIDER=anthropic
ARGYPH_LOCATE_SMART_MODEL=claude-haiku-4-5
```

When disabled, calls to `locate_smart` return `LOCATE_SMART_DISABLED` immediately and no provider keys are required.

Build with the smart feature:

```bash
cargo install argyph --features smart
```

---

## Configuration

Config is layered (highest priority first): env vars, `.argyph/config.toml` in the repo, built-in defaults. A config file is never required.

```bash
ARGYPH_LOG=info
ARGYPH_EMBED_PROVIDER=local        # local | openai | voyage
OPENAI_API_KEY=...                  # standard provider env vars
ARGYPH_DISABLE_WATCHER=true         # for sandboxed environments
```

Install agent lookup instructions in `CLAUDE.md`, `AGENTS.md`, or `GEMINI.md`:

```bash
argyph init
```

---

## Why Argyph (vs alternatives)

| Tool                       | Symbol graph | Semantic search | Local-first | Single install | Incremental |
|----------------------------|:------------:|:---------------:|:-----------:|:--------------:|:-----------:|
| claude-context (Zilliz)    |              | yes             |             | yes            | yes         |
| GitNexus / CodeGraphContext| yes          |                 |             |                |             |
| repomix                    |              |                 | yes         | yes            |             |
| Serena                     | yes          |                 | yes         | yes            |             |
| **Argyph**                 | **yes**      | **yes**         | **yes**     | **yes**        | **yes**     |

---

## Benchmarks

Reproducible numbers, methodology in [`docs/benchmarks.md`](docs/benchmarks.md). Reproduce locally with `cargo bench --workspace`. Numbers below are the median of three runs on the reference hardware tagged `m3-pro` (Apple M3 Pro, 36 GB RAM, macOS 15) unless stated otherwise.

### Hot paths (criterion, mean times)

| Bench                       | Mean      | What it measures                            |
|-----------------------------|-----------|---------------------------------------------|
| `locate_parse_path_bare`    | ~15 ns    | Parsing a bare path locator                  |
| `locate_parse_path_heading` | ~18 ns    | Parsing `path > Heading` locators            |
| `locate_strategy_path_only` | ~19 ns    | Strategy dispatch for path-only locator      |
| `locate_strategy_scoped`    | ~32 ns    | Strategy dispatch for scoped query locator   |
| `token_count_rust_file`     | ~4.4 µs   | `cl100k_base` tokenization of one source file |
| `walk_project_root`         | ~244 ms   | Full `walkdir` traversal of this repo        |

### System-level

End-to-end timings on an Apple M-series laptop, macOS 15. Reproduce via:

```bash
cargo run --release -p argyph-benches --bin system_bench -- /path/to/repo
```

| Fixture                              | Files  | LOC    | Tier 0 cold | Tier 1 full |
|--------------------------------------|-------:|-------:|------------:|------------:|
| `BurntSushi/ripgrep`                 |    215 |   ~52K |    **71 ms**|   **6.8 s** |
| `microsoft/TypeScript` (`src/`)      |    709 |  ~452K |    **30 ms**|   **8.2 s** |
| `microsoft/TypeScript` (whole repo)  | 81,310 |    ~2M |   **2.1 s** | see note ↓  |

**Tier 0 — the "useful immediately" gate — scales linearly and fast**
(81K files in 2.1 s). `search_text` and `pack_repo` are available the
moment Tier 0 is ready.

**Tier 1 (the symbol graph) is fast on normal-to-large repos.** Two
optimizations landed for v1.0: the within-file reference resolver was
rewritten from an O(symbols²) substring scan to one-pass hash-indexed
lookups (4.2× — TypeScript compiler source: 34.6 s → 8.2 s), and the
per-file symbol/chunk SQL writes are now batched. On *very* large
monorepos (the full 81K-file TypeScript repo, ~2M LOC) Tier 1 still
does not finish within 30 min — the residual cost is raw tree-sitter
parse volume across 79K files, which needs streaming/parallel indexing
(tracked in [ROADMAP.md](ROADMAP.md)). The server never blocks at any
scale: Tier-1 tools return `INDEX_NOT_READY` with a retry hint until
the graph is ready. Full data: [`docs/benchmarks.md`](docs/benchmarks.md).

---

## Architecture

Argyph is a Rust workspace of twelve focused crates with strict module ownership. The full architecture, including the Supervisor lifecycle, the three-tier indexing model, and per-crate responsibility boundaries, is documented in [ARCHITECTURE.md](ARCHITECTURE.md).

---

## Project status

**Release candidate.** `v1.0.0-rc.1` is published; `v1.0.0` will cut once the system-level benchmark targets in [`docs/SPEC.md`](docs/SPEC.md) § 6 are verified on the reference hardware. The build plan and milestones are in [`docs/BUILD_PLAN.md`](docs/BUILD_PLAN.md); the current milestone is tracked at the top of [ROADMAP.md](ROADMAP.md).

---

## Contributing

Argyph is built with substantial AI assistance, but human-architected and human-reviewed. Contribution guide and the (strict) AI agent rules are in [CONTRIBUTING.md](CONTRIBUTING.md). Commit conventions, including the project's attribution policy, are in [`docs/COMMIT_CONVENTIONS.md`](docs/COMMIT_CONVENTIONS.md).

---

## Author

Built by [Ezzy1630](https://github.com/Ezzy1630). See [AUTHORS.md](AUTHORS.md).

---

## License

Dual-licensed under either of:

- MIT License ([LICENSE-MIT]LICENSE-MIT)
- Apache License 2.0 ([LICENSE-APACHE]LICENSE-APACHE)

at your option.