mdx-rust
A Rust-native optimizer for LLM agents.
Point mdx-rust at an existing Rust agent (or crate), give it a behavioral policy, and let it safely improve prompts and simple single-file fallback behavior through structured experimentation - with compile-time safety gates at every step.
An early, safety-first optimizer for Rust LLM agents, with compile-time validation and Rust-aware analysis as its core differentiators.
Why mdx-rust?
- Native Rust understanding — Uses
syn+tree-sitter-rustto actually read and reason about your agent’s code instead of treating it as text. - Safety first — Every accepted change is validated with
cargo checkandclippyinside isolated git worktrees or copies, then revalidated on the real tree with rollback on failure. - Policy-driven — Your domain rules, constraints, and quality expectations (
policies.md) guide diagnosis, candidate generation, and scoring — not just “maximize the metric.” - Agent-friendly CLI — Excellent human output by default + first-class
--jsonmode so other coding agents can drive it. - Observable — First-class trace records plus experiment ledgers, hook decisions, validation command records, score deltas, and provenance fields for accepted changes.
- Disciplined lifecycle — Built-in hook stages for pre-edit, pre-command, post-validation, and pre-accept decisions.
- Audit aware — Deterministic static audit checks surface risky agent surfaces like process execution, secret literals, unsafe code, and MCP/A2A-style integration boundaries.
- Single binary path — install from this repo today with
cargo install --git https://github.com/dhotherm/mdx-rust --package mdx-rust; crates.io publication is prepared but not yet published.
Quick Start
Clone the repo and try the built-in example (the fastest way to see mdx-rust in action):
# The example rig-minimal-agent starts in a deliberately weak state
# See the improvement
You should see the optimizer detect the weak echo behavior, plan a targeted fallback edit, validate it safely, and accept it only if the score improves.
Full flow for your own agent:
Artifacts (traces, diagnoses, candidates, reports, diffs) live under .mdx-rust/agents/<name>/.
How It Works
- Register — Detects entrypoint (Rig agent, async fn, or generic JSON contract), creates a thin harness if needed, and smoke-tests invocation.
- Spec — Analyzes your agent and produces starter
policies.md,eval_spec.json, anddataset.jsonartifacts. - Optimize — Runs the agent on a deterministic dataset with lifecycle traces → scores outputs → diagnoses failures → generates targeted candidate patches → validates safely → keeps only net-positive changes with regression guards.
- Gate — Every candidate moves through explicit lifecycle stages: pre-edit, isolated validation, patched evaluation, pre-accept hook, final validation, and rollback on failure.
- Repeat — Multiple iterations, budgeted candidate pools, holdout splits, and experiment ledgers.
Key Commands
Every command that matters supports --json for coding agents and automation.
Safety Model
The optimizer is built around a conservative lifecycle:
- Analyze source scope with Rust-aware finders.
- Run the agent on a train split and record trace diagnoses.
- Generate typed candidate strategies.
- Build a targeted edit only when a safe planner exists.
- Run built-in lifecycle hooks.
- Apply and validate in an isolated workspace.
- Score the patched workspace.
- Land only net-positive changes.
- Run final validation on the real tree.
- Roll back if final validation fails.
Experiment records include dataset version/hash, scorer version, git SHAs, validation commands, score deltas, hook decisions, holdout score, and prompt variant ledger entries.
The full acceptance contract is documented in SAFETY_INVARIANTS.md.
Current Scope
v1 is deliberately conservative:
- Accepted edits are single-file only.
- Current edit strategies cover prompts and common echo-style fallback behavior.
evalcan load and hash datasets, but scored standalone evaluation is still being built.- Native Rust contracts currently execute through the process harness; richer in-process harnesses are future work.
- crates.io publication requires publishing the workspace crates in order, so install from GitHub for now. See RELEASE.md.
Status
Active dogfood / private beta (May 2026).
mdx-rust can already:
- Register Rig and generic agents
- Run them with tracing
- Perform deep Rust analysis (prompts, tools, entrypoints)
- Run deterministic diagnosis with structured candidates
- Safely propose, validate (
cargo check+clippyin isolation), and accept improvements - Execute typed strategies for prompts and common single-file fallback behavior
- Split evaluation data into train/holdout sets under
light,medium, orheavybudgets - Record prompt variant ledgers and lifecycle hook decisions
- Run deterministic static security audits
- Support dry-review mode (
--review) - Produce experiment reports and artifacts
The built-in example demonstrates a real before/after optimization win.
See PROGRESS.md for the detailed build log.
Contributing
We welcome contributions, especially around:
- Better Rust code analysis (tree-sitter queries, syn visitors)
- New candidate generation strategies
- Support for additional agent frameworks
- Evaluation harnesses and scoring functions
See AGENTS.md for guidance on working in this codebase (especially important since this tool is designed to be used by agents).
License
MIT
The machine that improves the machines — in Rust.