argyph-pack 1.0.4

Local-first MCP server giving AI coding agents fast, structured, and semantic context over any codebase.
Documentation
# `argyph-pack` — repo packing

## Purpose

Take a repo (or a subset) and produce a token-budgeted, agent-readable flattened representation. Equivalent in spirit to `repomix`, but driven by the symbol graph for smarter prioritization.

## Owns

- The `Packer` trait and default implementation.
- Token counting per provider tokenizer (default `tiktoken-rs` for OpenAI-compatible models; per-provider tokenizers when configured).
- Priority heuristic for inclusion order under a token budget:
  1. Explicitly-requested paths
  2. Entry points (`main.rs`, `lib.rs`, `index.ts`, `__main__.py`, `package.json` `main` field)
  3. README and top-level docs
  4. Recently modified files (within last 7 days, by mtime)
  5. Files with high in-edge count in the symbol graph
  6. Remaining files in lexicographic order
- Output formats: XML (primary, agent-friendly), markdown (human-friendly).
- Per-file truncation strategy when even one file exceeds the remaining budget (drop function bodies, keep signatures).

## Must never own

- Indexing of any kind.
- Storage.
- Embedding.
- Anything user-input-shaped beyond the `PackRequest` struct.
- Direct filesystem I/O — file content comes through `argyph-core::Index::read_range` or equivalent.

## Public surface

```rust
pub trait Packer {
    fn pack(&self, req: &PackRequest, ctx: &dyn PackContext) -> Result<PackResult>;
}

pub trait PackContext {
    fn list_files(&self, scope: &PackScope) -> Vec<Utf8PathBuf>;
    fn read(&self, file: &Utf8Path) -> Result<String>;
    fn modified(&self, file: &Utf8Path) -> Option<SystemTime>;
    fn in_edges(&self, file: &Utf8Path) -> Result<usize>;
}

pub struct PackRequest {
    pub scope: PackScope,                 // All | Paths(...) | Symbol(name)
    pub format: PackFormat,               // Xml | Markdown
    pub token_budget: usize,
    pub include: PackInclude,             // tests, docs flags
}

pub struct PackResult {
    pub format: PackFormat,
    pub content: String,
    pub token_count: usize,
    pub files_included: Vec<Utf8PathBuf>,
    pub files_truncated: Vec<Utf8PathBuf>,
    pub files_omitted: Vec<Utf8PathBuf>,
}
```

## Internal structure

- `src/lib.rs` — trait, `DefaultPacker`.
- `src/priority.rs` — the inclusion-order heuristic.
- `src/render/xml.rs` — XML format.
- `src/render/markdown.rs` — markdown format.
- `src/tokenize.rs` — token counting.
- `src/truncate.rs` — per-file truncation strategy.

## Failure modes

- **AI agents adding a JSON renderer "for completeness."** It is deliberately deferred (see `docs/SPEC.md` § 4.2).
- **AI agents counting tokens with the wrong tokenizer.** The default is `tiktoken-rs` (OpenAI-compatible). Per-provider tokenizers are loaded from `argyph-embed`'s tokenizer registry when configured.
- **AI agents reading files directly through `std::fs`.** They must go through `PackContext`, which the orchestrator (`argyph-core`) supplies.
- **AI agents producing malformed XML.** Output is validated against an XSD-equivalent test in CI.

## Honest limitations

- Token counts are estimates within ~5% accuracy across non-OpenAI providers; the budget is a soft guarantee.
- The priority heuristic is hand-tuned, not learned. Future work could add a small re-ranker fed by feedback signals.
- We do not currently dedupe near-duplicate files; in monorepos with vendored copies of the same lib, both copies may be packed.

## Stability

- The `PackResult` shape is part of the public MCP tool schema. Changes require a major version bump after v1.0.
- Output format strings are part of the user contract — a downstream agent's prompt may parse the XML structure. Format changes are breaking.