symforge 7.3.0 - Docs.rs

# AGENTS.md

This repository is `symforge`.

It is a Rust-native, coding-first MCP project for code indexing, retrieval, and recovery.

## Mission

Build a world-class MCP for code indexing, retrieval, orchestration, and recovery.

Primary qualities:
- speed
- robustness
- idempotency
- deterministic behavior
- self-healing and self-recovery
- strong edge-case handling
- coding-first ergonomics

## Core Architecture Direction

Use a local-first architecture:
- Rust MCP server for the protocol surface
- in-process LiveIndex as the primary query engine
- local snapshot persistence under `.symforge/` for warm startup and recovery
- tree-sitter-based parsing and symbol/reference extraction in Rust

The read path should stay in-process and memory-resident whenever possible.

Reason:
- code-intelligence queries must be fast and deterministic
- symbol spans depend on exact bytes from the current workspace
- restart recovery should come from local snapshots, not an external control plane

## Product Principles

- Coding-first beats generic document-first behavior.
- Determinism beats convenience.
- Explicit recovery beats hidden retry magic.
- Corruption should be quarantined, not silently served.
- Long-running operations must be resumable.
- Mutating operations must support idempotency.
- Shutdown is not a safe persistence boundary.

## Storage Principles

Use local-first persistence, not an external control plane.

Recommended split:
- In-process LiveIndex:
  - file contents needed for active queries
  - symbol metadata
  - reference metadata
  - reverse indices and search structures
  - watcher and health state
- Local `.symforge/` state:
  - serialized index snapshots
  - temp files and quarantine artifacts
  - sidecar/session coordination metadata
  - future derived artifacts where local persistence is useful

Snapshot and retrieval rules:
- write bytes exactly as read
- never normalize line endings
- never decode and re-encode for persistence
- verify source slices against stored hashes

## Idempotency Rules

Mutating tools must accept an `idempotency_key` when appropriate.

Required behavior:
- normalize request arguments into a canonical hash
- first execution stores `idempotency_key + request_hash + status`
- replay with same key and same hash returns the stored result
- replay with same key and different hash fails deterministically

Likely idempotent tools:
- `index_folder`
- `index_repository`
- `repair_index`
- `checkpoint_now`
- future write or annotation tools

## Recovery Rules

Self-healing means deterministic repair paths.

The system should support:
- startup sweeps for stale leases and temp files
- checkpoint replay for interrupted runs
- quarantine of bad parses or bad spans
- scheduled repair jobs
- integrity verification
- explicit health and repair tools

Failure should degrade safely:
- process crashes should be resumable
- parser failures should isolate a file, not poison a run
- bad symbol spans should never be served silently

## MCP Surface

This project should eventually support:
- tools
- resources
- prompts

Do not design for tools only.

Likely foundation tools:
- `health`
- `index_folder`
- `index_repository`
- `get_index_run`
- `cancel_index_run`
- `checkpoint_now`
- `repair_index`
- `search_symbols`
- `search_text`
- `get_file_outline`
- `get_symbol`
- `get_symbols`
- `get_repo_outline`
- `invalidate_cache`

Likely useful resources:
- repository outline
- repository health
- run status
- symbol metadata views

Likely useful prompts:
- codebase audit
- architecture map
- failure triage
- index repair diagnosis

## Memory Strategy

Project memory should be layered:
- runtime memory:
  - live index state
  - watcher state
  - recent health and verification state
- persisted local memory:
  - snapshot files
  - file metadata
  - symbol metadata
  - hashes and recovery artifacts
- semantic memory:
  - optional embeddings for fuzzy recall over docs, notes, and conversations

The current architecture does not require an external database for query serving.

If semantic search becomes important:
- start simple
- keep the query path local-first
- add a dedicated sidecar only if scale or latency requires it

## Current Known Context

As of 2026-03-06:
- this repo was freshly created and bootstrapped as a Rust project
- there is an `rmcp`-based stdio server scaffold
- an earlier Python prototype found a real Windows byte-offset bug caused by newline translation during raw cache writes
- that bug is a design warning: byte-exact storage is non-negotiable

## Implementation Guidance

- Prefer clean module boundaries:
  - `protocol`
  - `application`
  - `domain`
  - `storage`
  - `indexing`
  - `parsing`
  - `observability`
- Keep domain logic testable without MCP or database runtime dependencies.
- Prefer bounded concurrency and structured shutdown.
- Long-running operations should return durable run ids when appropriate.
- Use Rust everywhere possible.
- If Python tooling is ever needed, use `uv`, not `pip`.

## Working Style

- Be pragmatic, direct, and engineering-focused.
- Avoid unnecessary boilerplate.
- Prefer implementing over theorizing once direction is clear.
- Preserve backward compatibility only when it serves the product.
- This project is ours now; optimize for the best end state, not legacy imitation.

## Tooling Preference

When SymForge MCP is available, prefer its tools for repository and code inspection before falling back to direct file reads.

Use SymForge first for:
- symbol discovery
- text/code search
- file outlines
- repository outlines
- targeted symbol/source retrieval
- inspection of implementation code under `src/`, `tests/`, and similar code-bearing directories

Preferred tools:
- `search_text`
- `search_symbols`
- `get_file_outline`
- `get_repo_outline`
- `get_symbol`
- `get_symbols`

Default rule:
- use SymForge to narrow and target code inspection first
- use direct file reads only when exact full-file source or surrounding context is still required after tool-based narrowing

Direct file reads are still appropriate for:
- exact document text in `docs/` or planning artifacts when literal wording matters
- configuration files where exact raw contents are the point of inspection

Do not default to broad raw file reads for source-code inspection when SymForge can answer the question more directly.