# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.5.9] - 2026-05-31
### Changed
- **crates.io ready** — `cargo publish --dry-run` passes. Build script
downloads models to `OUT_DIR` instead of source tree. Published crate
is 171 KB compressed (models downloaded at build time via `build.rs`).
- **Cargo.toml exclude** — models/, benchmarks/, book/, docs/, .prx/,
.github/ excluded from published crate.
### To publish
```bash
cargo login <your-token>
cargo publish
```
## [0.5.8] - 2026-05-30
### Added
- **mdBook documentation site** — 33 pages across 7 sections (User Guide,
Commands, Architecture, Performance, Reference, Contributing, Vision).
Deployed to `civitas-io.github.io/prx/` via GitHub Pages.
- **`deploy-docs.yml`** — GitHub Actions workflow for automatic docs
deployment on push to main.
### Changed
- **Internal docs reorganized** — sprint-specific and internal design docs
moved to `docs/internal/`. CLAUDE.md removed (redundant with AGENTS.md).
- **Makefile simplified** — removed stale `setup`, `models`, `coverage`
targets. Added `docs` target for mdBook build.
## [0.5.7] - 2026-05-29
### Added
- **Public benchmark suite** — 200 labeled queries across 8 public repos
(flask, ripgrep, fastify, cargo, kafka, django, terraform, vscode).
6 languages, 3 size tiers. Measured NDCG@10 with ground-truth relevance.
- **`benchmark.yml` CI workflow** — runs NDCG benchmark on release tags.
Clones all 8 repos at pinned SHAs, indexes, benchmarks, fails on
regression > 0.05. Results uploaded as artifacts.
- **Versioned baseline results** — `benchmarks/results/v0.5.7-baseline.json`
with per-repo scores, category breakdowns, and miss counts.
- **v0.6.0 Model Tiering milestone** — design doc for code-specific
Model2Vec models with download-on-demand. Based on benchmark findings
showing semantic search degradation at scale.
### Benchmark Results
| Small (<3K files) | flask, ripgrep, fastify | 0.545 |
| Medium (3-10K files) | cargo, kafka, django | 0.332 |
| Large (10K+ files) | terraform, vscode | 0.248 |
## [0.5.6] - 2026-05-29
### Changed
- **Memory-mapped embeddings** — `embeddings.bin` loaded via `memmap2` with
zero-copy `bytemuck::cast_slice`. OS page cache keeps 54 MB of embeddings
warm across queries. Falls back to owned allocation if mmap fails.
- **`bench-ndcg` load-once** — index loaded once, queries run N times.
50-query benchmark: 12.76s → 0.23s (55x speedup).
- **`bench-ndcg --plain`** — human-readable table output with NDCG scores,
per-category breakdown, and miss list.
- **Test helpers module** — shared `ag()`, `test_dir()`, `parse_json()` in
`tests/helpers/mod.rs`. 5 new E2E tests covering semantic search with
index, alpha override, run modes, and context budgets.
### Stats
- 442 unit + 80 E2E + 8 MCP = 530 tests (was 525).
## [0.5.5] - 2026-05-29
### Changed
- **Parallel indexing with rayon** — all 5 stages of `prx index` now run
in parallel. File read/hash/chunk, BM25 enrichment, embedding computation,
import graph, and symbol index all use `par_iter` or `rayon::join`.
BLAS thread limits set at process start to prevent oversubscription.
### Performance
- **7.6x speedup** on large codebases (11K files, 55K chunks):
410s → 54s on 10-core Mac (944% CPU utilization).
- **3x speedup** on small codebases (258 files, 910 chunks):
1.2s → 0.4s.
- Embedding computation (94% of indexing time) parallelized with
shared `&model` reference — no Arc, no Mutex, no cloning.
## [0.5.4] - 2026-05-29
### Changed
- **`define_regex!` macro** — reduces 3-line `LazyLock<Regex>` statics to
1-line macro calls across all 21 runner parsers.
- **`ParsedResult::new()` constructor** — replaces 7-field struct literals
in 8 parsers where `warnings` and `tail` are empty defaults.
- **Shared `workspace` module** — extracted `relative_path()` and
`is_test_file()` from duplicated copies in `context.rs` and `impact.rs`
into `src/workspace.rs`.
### Stats
- Net -159 lines across 27 files. No behavior changes. All 442 tests pass.
## [0.5.3] - 2026-05-28
### Added
- **`prx bench-ndcg`** — Rust-native NDCG benchmark runner. Calls
search::run() directly without spawning external processes. Outputs
structured JSON with per-query scores, category breakdown, and misses.
Suitable for CI regression gate.
## [0.5.2] - 2026-05-27
### Changed
- **Self-contained build** — `cargo build` now works without `make models`
or Python. The build script (`build.rs`) downloads model weights from
HuggingFace, verifies SHA-256 hashes, and converts F32→F16 in pure Rust.
Set `PRX_MODELS_DIR` for offline/air-gapped builds.
- **Migrated bincode → postcard** — replaced unmaintained `bincode`
(RUSTSEC-2025-0141) with `postcard` for all index serialization.
Existing `.prx/index/` directories will auto-rebuild on version mismatch.
## [0.5.1] - 2026-05-27
### Fixed
- **Renamed model file** — `potion-code-16M.safetensors` renamed to
`potion-retrieval-32M.safetensors` to match the actual model (upgraded
in v0.3.0 but file never renamed). Updated `include_bytes!` references,
download script, and all documentation.
### NDCG (v0.5.0 tree-sitter imports)
- External NDCG@10: 0.494 (stable, no regression from v0.4.0)
- prx self-benchmark: NDCG@10 = 0.673 (stable, was 0.681 in v0.4.0)
- 9 complete misses unchanged (all semantic, unrelated to imports)
## [0.5.0] - 2026-05-27
Tree-sitter imports & auto-JSON release. Import extraction rewritten from
regex to AST queries, extended to all languages with import concepts.
### Added
- **`prx run --auto-json`** — auto-injects `--json`/`-o json` flags for
tools that support structured output (kubectl, terraform, npm, eslint,
mypy). Existing JSON detection in parsers handles the output side.
- **Import extraction for bash, CSS, HTML** — `source`/`.` commands,
`@import` rules, `<script src>`/`<link href>` attributes.
- **Tree-sitter import forms** — multi-path `use` (Rust), multiline
imports (Python), re-exports and dynamic `import()` (JS/TS), type
imports (TS).
### Changed
- **Import extraction rewritten from regex to tree-sitter** — all 10
language families now use AST queries instead of line-by-line regex
matching. Captures forms that regex cannot: multi-line imports, aliased
imports, re-exports, dynamic `import()` calls.
### Stats
| Tests | 421 | 435 |
| Import languages | 7 (regex) | 10 (tree-sitter) |
| Import forms captured | basic only | multi-line, aliased, re-export, dynamic |
## [0.4.5] - 2026-05-27
### Fixed
- **Documentation consistency** — softened "tree-sitter for all structural
awareness" claim to reflect that import extraction uses regex. Added build
prerequisites (Python 3, network) to README. Added context/impact to
architecture docs. Added import graph proximity stage to search pipeline docs.
## [0.4.4] - 2026-05-27
### Changed
- **Incremental embeddings** — `prx index` now caches per-chunk content
hashes alongside embeddings. On re-index, only chunks whose content
changed are re-embedded; unchanged chunks reuse cached embeddings.
For a 1-file change in an 11k-file repo, this reduces embedding time
from ~300s (full re-embed) to seconds.
## [0.4.3] - 2026-05-27
### Fixed
- **Import resolution no longer bails on common filenames** — previously,
`resolve_import` gave up (returned no edge) when a name matched >3 files.
Common names like `index.ts`, `utils.py`, `mod.rs` triggered this in large
repos, making the import graph sparser as repos grew. Now uses directory
proximity to pick the closest 1-2 candidates instead of giving up. Threshold
raised from 3 to 5, with proximity-based disambiguation above that.
## [0.4.2] - 2026-05-27
### Fixed
- **Embedding model failure now warns instead of silently degrading** —
when `load_model()` fails during indexing, `prx index` output now includes
a `warnings` field explaining that search will use BM25 only. Previously,
`embeddings_dim: 0` was written silently and search quality halved with
no signal to the user.
## [0.4.1] - 2026-05-27
### Fixed
- **`is_valid` now detects new files** — previously, `prx index` reported
"up_to_date" when new files were added after indexing, because `is_valid()`
only checked that previously-indexed files hadn't changed. Now walks the
tree to detect both new and deleted files.
## [0.4.0] - 2026-05-26
Project Intelligence & Run Parsers release. Symbol index for search quality,
two new commands (context, impact), 13 new run parsers, security CI, and
JSON output detection.
### Added
- **Symbol index** — maps symbol names to definition locations with reference
counts at index time. For symbol queries, boosts definition chunks directly
instead of relying on BM25. Symbol NDCG@10: 0.263 → 0.619 (+135%). 4
previously-complete-miss symbol queries recovered.
- **`prx context`** — assembles a context package for a module in one call:
stats, documentation, entrypoints ranked by reference count, file skeletons,
and 1-hop import graph edges. Replaces outline + find + cat README + grep.
- **`prx impact`** — reverse dependency analysis. Walks the import graph
backwards to find what depends on a file. Supports `--symbol` narrowing,
`--hops` control, fan-in protection, test file filtering.
- **13 new run parsers** — mypy, dotnet, git-log, docker-build, npm-ls,
terraform, kubectl, kubectl-logs, mvn, gradle, pytest-cov, go-cover,
jest-coverage. Total: 22 parsers.
- **JSON output detection** — kubectl, terraform, npm-ls, and eslint parsers
auto-detect JSON responses when user passes `--json`/`-o json` and parse
structurally instead of regex-matching text.
- **Security CI** — `cargo-deny` runs on every push/PR checking advisories
(RustSec), license compliance, source origin, and dependency bans.
- **`deny.toml`** — security policy configuration.
- **`benchmarks/repos.json`** — 8 public repos pinned by SHA for NDCG
regression testing (flask, ripgrep, fastify, cargo, django, kafka,
terraform, vscode).
- **`docs/design/RUN-PARSERS.md`** — design doc for the parser system.
- **`is_symbol_query`** now detects snake_case identifiers (e.g. `feature_impact`).
- Symbol queries routed to hybrid search instead of literal search.
### Changed
- **NDCG measurement corrected** — previous scores were inflated by a
deduplication bug. All docs updated with corrected numbers.
- **`skills/agents.md`** rewritten with tool replacement table, new commands,
recommended workflow.
### Stats
| Commands | 14 | 16 |
| Tests | 372 | 413 unit + 75 E2E + 8 MCP |
| Run parsers | 9 | 22 |
| Index files | 5 | 6 (+symbols.bin) |
| NDCG@10 (self) | 0.639 | 0.681 |
| NDCG@10 (external) | 0.451 | 0.494 |
| Symbol NDCG@10 | 0.263 | 0.619 |
## [0.3.0] - 2026-05-25
Reliability & Search Quality release. NDCG measurement infrastructure, incremental
indexing, persistent dense index, and search ranking improvements.
### Added
- **MCP server E2E tests** — 8 tests covering JSON-RPC initialize, tools/list,
tools/call for all 6 MCP tools, and invalid tool error handling.
- **Incremental indexing** — `prx index` skips unchanged files by comparing content
hashes from the previous index. Reports `files_changed`/`files_unchanged` in output.
Walker now excludes `.prx/` directory.
- **Real criterion benchmarks** — `benches/search.rs` (BM25 build/query, literal search,
persistent index build, incremental no-op) and `benches/chunking.rs` (Rust/Python/plaintext
at 10/50/100/500 functions).
- **NDCG@10 measurement** — labeled relevance datasets for prx (50 queries) and an external
11k-file Python/TypeScript codebase (49 queries). Automated NDCG harness in `tests/ndcg.rs`.
Results tracked per-release in README.
- **Structural search validation** — warns when a pattern compiles but matches 0 files,
or when a pattern fails to compile for all languages. Warning surfaced in search output.
- **Persistent dense index** — chunk embeddings computed at index time and stored as
`embeddings.bin`. At query time, semantic retrieval runs independently of BM25 before
RRF fusion. Unlocks semantic recall for queries where BM25 fails.
- **Chunk header enrichment** — BM25 enrichment prepends `[lang] file_path stem_tokens`
to each chunk. Split identifiers (camelCase/snake_case) indexed as separate terms.
- **Synonym expansion** — 18-pair static dictionary (auth→authentication, db→database,
k8s→kubernetes, etc.) applied to BM25 queries for natural language searches.
- **Chunk overlap** — 200-byte overlap between adjacent chunks, snapped to line boundaries.
- **Configurable reranker** — `RerankConfig` struct enables selective stage toggling for
ablation testing.
- `docs/design/SEARCH-QUALITY.md` — full NDCG analysis, failure mode diagnosis, improvement
roadmap, and symbol graph feasibility assessment.
- `benchmarks/ndcg_dataset.json` — 50 labeled queries for prx codebase.
- `benchmarks/ndcg_dataset_external.json` — 49 labeled queries for external codebase.
### Changed
- **Symbol-query ranking** — definition boost increased from 3x to 12x for symbol queries
(single PascalCase/snake_case tokens). Import-heavy chunks penalized at 0.2x.
- **Alpha tuning** — symbol queries now use alpha=0.1 (near-pure BM25, was 0.3).
Natural language queries use alpha=0.6 (was 0.5). Queries containing synonyms use 0.5.
- **Reranker weights** — definition boost 3→4 (NL), stem match 1.0→1.5, file coherence
0.2→0.15, import penalty 0.3→0.2.
- **Definition detection** — improved pattern matching for Python/TypeScript class and
function definitions (requires space or paren after keyword).
- **Model loading** — extracted `load_model()` to `index/dense.rs` as a public function,
shared between index-time embedding and query-time fallback.
- `is_symbol_query()` made public for use by ranking pipeline.
### Stats
| Tests | 353 (304 unit + 49 E2E) | 372 (315 unit + 49 E2E + 8 MCP) |
| NDCG@10 (self) | — | 0.639 |
| NDCG@10 (external) | — | 0.451 |
| Benchmarks | 2 stubs | 8 real (search + chunking) |
| Index files | 4 (meta, chunks, bm25, imports) | 5 (+embeddings.bin) |
Note: v0.2.0 NDCG scores omitted — measured with a buggy script that did not
deduplicate files across chunks, producing inflated results. Corrected
methodology applied from v0.3.0 onward.
## [0.2.0] - 2026-05-19
Context Intelligence release. Conditional reads, read modes, and import graph proximity boost.
### Added
- **`--if-changed HASH`** flag for `prx read` — stateless conditional read. Agent passes
the `meta.hash` from a previous response; if the file is unchanged, prx returns a
~50-byte cached stub instead of full content. 99% token savings on re-reads.
- **`--mode aggressive`** — strips comments using tree-sitter (14 grammars) and collapses
blank lines. Preserves strings containing comment-like syntax. 1-19% savings depending
on comment density.
- **`--mode diff`** — returns only lines changed vs git HEAD. Falls back to full content
for untracked files. 98-99% savings on files with local modifications.
- **`--mode entropy`** — pattern-based repetitive line filter. Normalizes digits, allows
3 occurrences of each pattern, suppresses the rest. Up to 86% savings on generated code.
- **Import graph proximity boost** for search — extracts `import`/`use`/`require` statements
from 7 languages (Rust, Python, JS/TS, Go, Java, C/C++, Ruby) via regex. Files within
2 hops of top-ranked results get a 0.25x additive boost with hop decay. Graph persisted
to `.prx/index/imports.bin`.
- `docs/USAGE.md` — full command reference with real-world benchmarks.
- `skills/agents.md` — agent-facing skill guide: what prx is, how to use it, how to
install as MCP server / CLI skill / agent definition.
- Token savings SVG chart in README with per-feature measurements.
### Changed
- Search reranking pipeline now has 6 stages (was 5): added import graph proximity
between stem matching and noise penalties.
- `prx index --rebuild` now builds and persists the import graph alongside chunks and BM25.
- Telemetry baseline estimation improved for search, find, exists, and diff commands.
- E2E tests now route `PRX_STATS_FILE` and `PRX_ERRORS_FILE` to `/dev/null` (no more
test pollution in real telemetry).
- GitHub Actions: `softprops/action-gh-release` v2 → v3 (node24).
### Fixed
- `--mode diff` now correctly runs `git` from the file's parent directory, fixing
incorrect results when prx is called with absolute paths from a different CWD.
### Stats
| Tests | 304 (260 unit + 44 E2E) | 353 (304 unit + 49 E2E) |
| Modules | 29 | 32 (+imports.rs, graph.rs, proximity.rs) |
| LOC (src/) | ~8,200 | ~9,600 |
| Reranking stages | 5 | 6 (+ proximity) |
| Import graph languages | — | 7 |
## [0.1.0] - 2026-05-19
Initial release. 14 commands, 304 tests.
### Reliability
- Graceful fallback: on internal errors (panics, parse errors), prx silently falls back to grep/cat/find and returns results with `fallback: true` in the envelope. User errors (file not found) are returned normally.
- Error logging: every fallback logs to `~/.prx/errors.jsonl` for debugging
- Pre-commit hook: mirrors CI checks (fmt + clippy + tests)
### Telemetry & Benchmarks
- Real-world telemetry: every command logs `actual_bytes` vs `baseline_bytes` to `~/.prx/stats.jsonl`
- `prx stats --compare`: per-command savings breakdown from real usage
- `prx bench`: synthetic benchmark runner comparing prx vs grep+cat side-by-side
### Commands
- `prx search` — hybrid search: literal + semantic (Model2Vec) + structural (ast-grep). RRF fusion with adaptive alpha. 5-stage reranking pipeline.
- `prx read` — structured file reading with --skeleton, --snap, --outline, --hash, --budget.
- `prx find` — codebase mapping with tree+flat output, --pattern, --depth, --changed-since, --related-to.
- `prx edit` — find-replace with dry-run default, --apply, --in-function scoping, syntax validation.
- `prx diff` — git diff with semantic summaries, function attribution, --stat-only.
- `prx run` — structured command runner with 9 tool parsers (cargo test/build/clippy, pytest, go test, jest/vitest, tsc, eslint). 95-99% token savings on test output.
- `prx index` — persistent search index with validation, --rebuild, --stats, --watch.
- `prx outline` — symbol table for files and directories with --kind filter.
- `prx exists` — bloom filter O(1) existence check.
- `prx batch` — JSONL batch execution from stdin.
- `prx stats` — token savings dashboard.
- `prx init` — auto-detect agent frameworks, generate MCP configs and AGENTS.md snippets.
- `prx mcp` — MCP server over stdio exposing 6 tools.
### Infrastructure
- 14 tree-sitter language grammars (Rust, Python, JavaScript, TypeScript, Go, Java, C, C++, Ruby, Bash, JSON, HTML, CSS, TSX)
- Model2Vec potion-code-16M embedded in binary (float16, 31MB)
- Real vocabulary loading (61,826 tokens via HuggingFace tokenizer)
- cl100k_base tokenizer for --budget enforcement
- Persistent index serialization to .prx/index/
- Content hashing (xxh3) for change detection
- BM25 with compound identifier tokenization (camelCase/snake_case splitting)
- Cross-platform: Linux x86_64 + aarch64, macOS arm64 + Intel, Windows x86_64
- GitHub Actions CI (lint, test, build) + release pipeline (5 targets)
- Apache 2.0 license
### Documentation
- 21 documentation files (~5,000 lines)
- AGENTS.md with Karpathy coding guidelines
- PRD, roadmap, architecture, CLI spec, output schema, benchmarks plan, implementation plan, testing plan, crate reference, competitive landscape, platform audit, contributing guide
[0.3.0]: https://github.com/civitas-io/prx/releases/tag/v0.3.0
[0.2.0]: https://github.com/civitas-io/prx/releases/tag/v0.2.0
[0.1.0]: https://github.com/civitas-io/prx/releases/tag/v0.1.0