prx 0.5.9

Praxis — agent-native Unix tools. Single binary replacing grep, cat, find, sed, diff for AI coding agents.
# prx (Praxis)

[![CI](https://github.com/civitas-io/prx/actions/workflows/ci.yml/badge.svg)](https://github.com/civitas-io/prx/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Platforms](https://img.shields.io/badge/platforms-Linux%20%7C%20macOS%20%7C%20Windows-lightgrey)](#platform-support)
[![Docs](https://img.shields.io/badge/docs-civitas--io.github.io%2Fprx-blue)](https://civitas-io.github.io/prx/)

**AI coding agents burn most of their context window re-discovering code they've already seen. prx fixes that at the source.**

prx is a single Rust binary that replaces the Unix tools coding agents lean on most — `grep`, `cat`, `find`, `sed`, `diff` — with structured JSON output, hard token budgets, and an embedded semantic search model. One call returns a ranked, budgeted answer instead of a wall of text the agent has to read, parse, and re-read. No shell spawning, no post-hoc compression, no model server.

---

## The problem

Every coding agent runs some version of this loop:

```
1. grep "authenticate" src/          → file paths, line numbers
2. cat src/auth/handler.ts           → entire file (thousands of tokens)
3. grep "authenticate" src/ -A 5     → same noise, wider context
```

Most of those tokens are waste: whole files read to use ten lines, the same file loaded twice in a session, test logs dumped in full to find one failure. The tools aren't broken — they were built for humans reading a terminal, not for an agent paying for every token and working inside a fixed context window. That mismatch is the tax prx removes.

> The token-waste figures previously cited here are being re-sourced. Rather than quote a number we can't currently point you to a verifiable reference for, we let the per-command savings table below — measured on real sessions — speak for itself.

---

## The fix

```bash
prx search "authenticate" src/
```

```json
{
  "tokens": 487,
  "data": {
    "matches": [
      {
        "file": "src/auth/handler.ts",
        "line": 42,
        "context_name": "handleLogin",
        "snippet": "async handleLogin(req: Request)...",
        "relevance": 0.94
      }
    ],
    "total_matches": 23,
    "returned": 3,
    "budget_used": 487
  }
}
```

Ranked results, metadata included, under a token budget you control. The agent gets the answer, not the haystack.

---

## What makes prx different

**It replaces the tools, it doesn't wrap them.** Compression tools shell out to `grep`/`cat` and squeeze the output afterward. prx does the search, reading, and diffing itself — no subprocess, no re-parsing, no lossy post-processing.

**It covers the whole loop, not just search.** Retrieval-only tools still leave your agent to read, edit, diff, and run tests with the old noisy tools. prx handles search, structured reads, safe edits, semantic diffs, and parsed test/build output behind one consistent JSON envelope.

**It has no runtime dependencies.** One static binary, ~49 MB, no Python, no package manager, no network at runtime. It runs in containers and sandboxes as-is.

**The semantic model is built in.** A 32M-parameter retrieval-optimized embedding model (potion-retrieval-32M, stored as float16) is compiled directly into the binary. Semantic search runs on CPU in milliseconds — no model server, no vector database, no setup step.

**It's fast.** Indexing runs on all CPU cores in parallel (7.6x speedup on 10 cores). Embeddings are memory-mapped with zero-copy access — no heap allocation, no deserialization. A 50-query benchmark suite runs in 0.23 seconds.

---

## Token savings

Measured across real agent sessions on production codebases. Run the numbers on your own repo with `prx stats --compare` and `prx bench .`.

| Feature | Scenario | Savings |
|---|---|---|
| `read --if-changed` (cache hit) | Re-reading an unchanged file | ~99% |
| `read --mode diff` | File with local changes | 98–99% |
| `read --skeleton` | Full file reduced to signatures | ~90% |
| `run` | Passing test suites | 95–99% |
| `read --mode entropy` | Generated / highly repetitive code | ~86% |
| `search` | vs grep + follow-up reads | ~35% |

<p align="center">
  <img src="docs/assets/token-savings.svg" alt="Token savings per command" width="720"/>
</p>

---

## Performance

### Indexing: 7.6x parallel speedup

`prx index` builds a persistent search index — BM25, semantic embeddings, import graph, and symbol definitions — in a single parallel pass. All five stages run on all available CPU cores via rayon.

| Codebase | Files | Chunks | Time |
|---|---|---|---|
| Flask (Python, 15K LOC) | 259 | 1,225 | **0.3s** |
| ripgrep (Rust, 25K LOC) | 239 | 2,465 | **0.6s** |
| fastify (TypeScript, 15K LOC) | 417 | 2,529 | **0.6s** |
| cargo (Rust, 150K LOC) | 2,815 | 12,118 | **5s** |
| terraform (Go, 2M LOC) | 5,323 | 22,798 | **10s** |
| django (Python, 300K LOC) | 5,690 | 30,944 | **32s** |
| kafka (Java, 500K LOC) | 7,231 | 63,740 | **114s** |
| vscode (TypeScript, 1M LOC) | 14,643 | 136,056 | **340s** |

Measured on 10-core Apple Silicon with rayon parallelism (944% CPU utilization). On CI runners (4 cores), expect ~3-4x speedup over sequential. Incremental rebuilds skip unchanged files entirely.

### Search: zero-copy memory-mapped embeddings

Embedding vectors are memory-mapped directly from disk via `memmap2` and cast to `&[f32]` with zero allocation using `bytemuck`. The OS page cache keeps the index warm across queries — no heap allocation, no deserialization, no repeated file reads.

On an 11K-file codebase with 54 MB of embeddings, this means:

- **Zero bytes** allocated for embedding data (OS manages the pages)
- Queries after the first hit warm cache — sub-millisecond embedding access
- Falls back to owned allocation automatically if mmap isn't available (network FS, etc.)

### Benchmarking: 55x speedup with load-once

`prx bench-ndcg` measures search quality (NDCG@10) against labeled datasets. It loads the index once and runs all queries against cached data:

| Benchmark | Before (v0.5.5) | After (v0.5.6) | Speedup |
|---|---|---|---|
| 50-query NDCG suite | 12.76s | **0.23s** | **55x** |

Use `--plain` for human-readable output in the terminal.

---

## The commands agents actually orchestrate around

Most tools stop at "better grep." The two commands below are why prx is useful for agents working inside a tight context window — they answer questions that would otherwise take a dozen `grep`/`cat` calls to reconstruct.

### `prx context` — understand a module in one call

```bash
prx context src/auth/
```

Returns a single structured package for a directory: summary stats, doc/README content, entrypoints, per-file **skeletons** (signatures without bodies), and the **import edges** connecting the files. Instead of the agent running `find`, then `cat README`, then `outline` on each file, then chasing imports by hand, it gets the whole mental model of a module in one budgeted response — ideal for the "load just enough to start a task" step in an agent loop.

### `prx impact` — know what breaks before you touch it

```bash
prx impact src/auth.ts
```

Reverse-dependency analysis built on prx's import graph: it answers "what depends on this file?" so an agent (or a human) can scope a refactor before making it. Edges are extracted from the AST (see [How search works](#how-search-works)); when an import name is ambiguous across many files, resolution falls back to a directory-proximity heuristic and returns the most likely candidates rather than guessing blindly. Treat its output as a high-quality map, not a formal proof of completeness.

---

## All commands

| Command | Replaces | What it does |
|---|---|---|
| `prx search` | grep, rg | Hybrid search: literal + semantic + structural. Ranked, token-budgeted. |
| `prx read` | cat, head, tail | Structured reading. `--if-changed` cache, `--skeleton`, `--mode`, `--snap`. |
| `prx find` | find, ls, tree | Codebase mapping. Tree or flat output, inline metadata, semantic scoring. |
| `prx edit` | sed, awk | Safe edits. Literal matching, dry-run by default, tree-sitter syntax validation. |
| `prx diff` | diff, git diff | Semantic diffs with function-level attribution and natural-language summaries. |
| `prx run` || Parsed test/build/lint output. 22 parsers; `--auto-json` for tools with structured output. |
| `prx context` || Module context package: stats, docs, entrypoints, skeletons, import edges. |
| `prx impact` || Reverse dependency analysis: what depends on a given file. |
| `prx outline` | ctags | Symbol table for a file or directory. |
| `prx exists` | grep -q | Fast bloom-filter existence check, near-zero tokens. |
| `prx index` || Parallel persistent index: 11K files in ~55s (7.6x speedup via rayon). |
| `prx mcp` || MCP server over stdio for direct agent integration. |
| `prx batch` | xargs | Parallel JSONL batch execution. |
| `prx init` || Detects agent frameworks and generates integration configs. |
| `prx stats` || Token-savings dashboard, with `--compare`. |
| `prx bench` || Side-by-side benchmark: prx vs grep+cat. |
| `prx bench-ndcg` || NDCG search quality benchmark against labeled datasets. |

17 commands total. Full reference with examples in the [documentation site](https://civitas-io.github.io/prx/).

---

## Quick start

```bash
# Search by meaning, not just text
prx search "authentication flow" src/

# Get a module's whole shape in one call
prx context src/auth/

# See what depends on a file before refactoring it
prx impact src/auth.ts

# File structure without the bodies (~10% of the tokens)
prx read src/auth.ts --skeleton

# Read just the function you need
prx read src/auth.ts --lines 42 --snap function

# Skip re-reading a file that hasn't changed
prx read src/auth.ts --if-changed a3f9b2c1

# Safe edit with a preview before applying
prx edit src/auth.ts --find "old_api()" --replace "new_api()"

# Run tests, get only failures and a summary
prx run cargo test
```

---

## How search works

prx fuses three retrieval methods into one ranked result:

- **Literal** — regex matching at ripgrep speed.
- **Semantic** — the embedded potion-retrieval-32M Model2Vec model (PCA-reduced to 256 dims, float16); runs on CPU in milliseconds, no server.
- **Structural** — AST pattern matching via tree-sitter, e.g. `fn $NAME($$$) { $$$ }` to match all function definitions.

Results are combined with Reciprocal Rank Fusion and reranked through a multi-stage pipeline: definition boost, identifier-stem matching, file coherence, **import-graph proximity** (favoring files in the dependency neighborhood of strong hits), noise penalties, and saturation decay.

```bash
prx search "authentication flow" src/                  # semantic (auto-detected)
prx search --literal "authenticate(" src/              # exact match, ripgrep speed
prx search --structural 'fn $NAME($$$) { $$$ }' src/   # AST pattern matching
```

The import graph is extracted from the AST (tree-sitter) across 10 language families that have an import concept. Search quality is tracked with NDCG@10 on labeled datasets — see [Search quality](#search-quality) for the honest numbers and methodology.

---

## `prx run` — structured command output

Test runners emit thousands of tokens an agent doesn't need:

```
running 164 tests
test test_one ... ok
test test_two ... ok
[... 162 more lines ...]
test result: ok. 164 passed; 0 failed
```

`prx run` parses that and returns only the signal:

```json
{ "passed": 164, "failed": 0, "duration_ms": 2341, "failures": [] }
```

22 parsers cover Rust, Python, Go, JavaScript/TypeScript, Java, .NET, Docker, Terraform, kubectl, Maven, Gradle, npm, mypy, git, common coverage tools, and a generic fallback for unrecognized commands.

---

## Agent integration

### MCP server

```json
{
  "mcpServers": {
    "prx": { "command": "prx", "args": ["mcp"] }
  }
}
```

Exposes prx over stdio to any MCP-compatible agent. (prx also works equally well as a plain CLI on `PATH` — see the tiers below.)

### Config generation

```bash
prx init                      # detect frameworks, generate configs
prx init --agents-md          # append a usage snippet to AGENTS.md
prx init --agent claude-code  # generate a dedicated Claude Code sub-agent
```

### Integration tiers

| Tier | How | Best for |
|---|---|---|
| **CLI on PATH** | `prx search ...` | Any agent, CI, scripts — the simplest and most portable path |
| **MCP server** | `prx mcp` | Agents that prefer structured tool calls mid-task |
| **Agent definition** | `prx init --agent claude-code` | A dedicated retrieval sub-agent |

### For AI agents

If you're an agent evaluating prx, read [`skills/agents.md`](skills/agents.md): installation, per-command examples, and measured token savings written for machine consumption.

---

## Reliability

If an internal operation fails, prx falls back to the equivalent Unix command and returns results in the same JSON envelope, flagged so the caller can tell a fallback occurred. Errors are logged to `~/.prx/errors.jsonl`. The intent is that prx never hard-breaks an agent's workflow — but because a fallback silently trades semantic search for plain matching, agents that depend on retrieval quality should check the flag rather than assume every result is a full-quality prx result.

---

## Install

### Prebuilt binary (recommended)

Download the binary for your platform from [GitHub Releases](https://github.com/civitas-io/prx/releases):

| Platform | File |
|---|---|
| Linux x86_64 | `prx-x86_64-unknown-linux-gnu.tar.gz` |
| Linux aarch64 | `prx-aarch64-unknown-linux-gnu.tar.gz` |
| macOS Apple Silicon | `prx-aarch64-apple-darwin.tar.gz` |
| Windows x86_64 | `prx-x86_64-pc-windows-msvc.zip` |

```bash
# Example: Linux x86_64
curl -L https://github.com/civitas-io/prx/releases/latest/download/prx-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv prx /usr/local/bin/
prx --version
```

The prebuilt binary already contains the embedded model — nothing else to install.

### Build from source

Requirements: Rust ≥ 1.85, a C compiler (for tree-sitter grammars), and network access on first build (the build script downloads model weights automatically).

```bash
git clone https://github.com/civitas-io/prx.git
cd prx
cargo build --release    # downloads model (~35 MB), converts to float16, builds
```

First build takes 1-2 minutes (model download + compilation). Subsequent builds are fast. The model weights are baked into the binary via `include_bytes!` — no downloads at runtime. Set `PRX_MODELS_DIR` to point to pre-downloaded weights for offline/air-gapped builds.

```bash
cargo test               # run all tests
cargo clippy              # lint
```

See the [Contributing guide](https://civitas-io.github.io/prx/contributing/setup.html) for the full developer setup.

---

## Platform support

| Platform | Status |
|---|---|
| Linux x86_64 | Supported |
| Linux aarch64 | Supported |
| macOS Apple Silicon | Supported |
| Windows x86_64 | Supported |

Single static binary. No runtime dependencies. No network required after build.

---

## Current status

| | |
|---|---|
| Commands | 17 |
| Tests | 442 unit + 80 E2E + 8 MCP |
| Run parsers | 22 (cargo, pytest, go, jest, eslint, tsc, kubectl, terraform, docker, + 13 more) |
| Languages (parsing) | 15 tree-sitter grammars |
| Import graph | 10 language families, tree-sitter AST extraction |
| Symbol index | Definition lookup + reference counting |
| Indexing | Parallel via rayon — 11K files in 54s on 10 cores (7.6x speedup). Zero-copy mmap embeddings. |
| Embedded model | potion-retrieval-32M (Model2Vec, float16, PCA→256 dims) |
| Release binary | ~49 MB |
| CI | GitHub Actions: Linux x86_64 / aarch64, macOS arm64, Windows |

See the [Roadmap](https://civitas-io.github.io/prx/vision/roadmap.html) for what's planned next.

---

## Search quality

NDCG@10 measured on 200 labeled queries across 8 public repositories (6 languages, 3 size tiers). All repos pinned by commit SHA. Ground truth in `benchmarks/repos/`. Methodology in [docs/design/SEARCH-QUALITY.md](docs/design/SEARCH-QUALITY.md).

| Repo | Language | Files | NDCG@10 | Symbol | Semantic |
|---|---|---|---|---|---|
| Flask | Python | 259 | **0.710** | 0.805 | 0.662 |
| ripgrep | Rust | 239 | **0.493** | 0.810 | 0.356 |
| fastify | TypeScript | 417 | **0.432** | 0.822 | 0.321 |
| cargo | Rust | 2,815 | **0.379** | 0.705 | 0.285 |
| kafka | Java | 7,231 | **0.354** | 0.934 | 0.191 |
| django | Python | 5,690 | **0.262** | 0.495 | 0.211 |
| terraform | Go | 5,323 | **0.287** | 0.238 | 0.319 |
| vscode | TypeScript | 14,643 | **0.208** | 0.639 | 0.080 |

Symbol search is consistently strong (avg 0.681) across all sizes. Semantic search degrades at scale — the 32M embedded model works best on codebases under 3K files. For larger repos, code-specific model tiers are planned (see [Roadmap](https://civitas-io.github.io/prx/vision/roadmap.html)).

These are honest numbers on codebases we didn't write and don't tune for.

---

## Contributing

See the [Contributing guide](https://civitas-io.github.io/prx/contributing/setup.html) for setup, workflow, and how to add commands, languages, and run parsers.

## License

Apache 2.0

---

Part of the [Civitas](https://github.com/civitas-io) ecosystem — open infrastructure for AI agent tooling.