mdx-rust
A Rust-native optimizer for LLM agents.
Point mdx-rust at an existing Rust agent (or crate), give it a behavioral policy, and let it safely improve prompts and simple single-file fallback behavior through structured experimentation - with compile-time safety gates at every step.
An early, safety-first optimizer for Rust LLM agents, with compile-time validation and Rust-aware analysis as its core differentiators.
Why mdx-rust?
- Native Rust understanding — Uses
syn+tree-sitter-rustto actually read and reason about your agent’s code instead of treating it as text. - Safety first — Every accepted change is validated with
cargo checkandclippyinside isolated git worktrees or copies, then revalidated on the real tree with rollback on failure. - Policy-driven — Your domain rules, constraints, and quality expectations (
policies.md) guide diagnosis, candidate generation, and scoring — not just “maximize the metric.” - Agent-friendly CLI — Excellent human output by default + first-class
--jsonmode so other coding agents can drive it. - Observable — First-class trace records plus experiment ledgers, hook decisions, validation command records, score deltas, and provenance fields for accepted changes.
- Disciplined lifecycle — Built-in hook stages for pre-edit, pre-command, post-validation, and pre-accept decisions.
- Audit aware — Deterministic static audit checks surface risky agent surfaces like process execution, secret literals, unsafe code, and MCP/A2A-style integration boundaries.
- Single binary path — install from crates.io with
cargo install mdx-rust, or from this repo withcargo install --git https://github.com/dhotherm/mdx-rust --package mdx-rust.
Quick Start
Install the CLI:
Clone the repo and try the built-in example (the fastest way to see mdx-rust in action):
# The example rig-minimal-agent starts in a deliberately weak state
# See the improvement
You should see the optimizer detect the weak echo behavior, plan a targeted fallback edit, validate it safely, and accept it only if the score improves.
Full flow for your own agent:
Artifacts (traces, diagnoses, candidates, reports, diffs) live under .mdx-rust/agents/<name>/.
How It Works
- Register — Detects entrypoint (Rig agent, async fn, or generic JSON contract), creates a thin harness if needed, and smoke-tests invocation.
- Spec — Analyzes your agent and produces starter
policies.md,eval_spec.json, anddataset.jsonartifacts. - Optimize — Runs the agent on a deterministic dataset with lifecycle traces → scores outputs → diagnoses failures → generates targeted candidate patches → validates safely → keeps only net-positive changes with regression guards.
- Gate — Every candidate moves through explicit lifecycle stages: pre-edit, isolated validation, patched evaluation, pre-accept hook, final validation, and rollback on failure.
- Repeat — Multiple iterations, budgeted candidate pools, holdout splits, and experiment ledgers.
Key Commands
Every command that matters supports --json for coding agents and automation.
Safety Model
The optimizer is built around a conservative lifecycle:
- Analyze source scope with Rust-aware finders.
- Run the agent on a train split and record trace diagnoses.
- Generate typed candidate strategies.
- Build a targeted edit only when a safe planner exists.
- Run built-in lifecycle hooks.
- Apply and validate in an isolated workspace.
- Score the patched workspace.
- Land only net-positive changes.
- Run final validation on the real tree.
- Roll back if final validation fails.
Experiment records include dataset version/hash, scorer version, git SHAs, validation commands, score deltas, hook decisions, holdout score, and prompt variant ledger entries.
The full acceptance contract is documented in SAFETY_INVARIANTS.md.
Current Scope
v1 is deliberately conservative:
- Accepted edits are single-file only.
- Current edit strategies cover prompts and common echo-style fallback behavior.
evalcan load and hash datasets, but scored standalone evaluation is still being built.- Native Rust contracts currently execute through the process harness; richer in-process harnesses are future work.
- Published crates are available for
mdx-rust,mdx-rust-core, andmdx-rust-analysis. See RELEASE.md.
Status
Early public beta (May 2026).
mdx-rust can already:
- Register Rig and generic agents
- Run them with tracing
- Perform deep Rust analysis (prompts, tools, entrypoints)
- Run deterministic diagnosis with structured candidates
- Safely propose, validate (
cargo check+clippyin isolation), and accept improvements - Execute typed strategies for prompts and common single-file fallback behavior
- Split evaluation data into train/holdout sets under
light,medium, orheavybudgets - Record prompt variant ledgers and lifecycle hook decisions
- Run deterministic static security audits
- Support dry-review mode (
--review) - Produce experiment reports and artifacts
The built-in example demonstrates a real before/after optimization win.
See ROADMAP.md for current scope and upcoming work.
Project Health
- Published on crates.io as
mdx-rust. - CI runs format, clippy, test, release build, crate packaging, and install smoke checks.
- Security and acceptance invariants are documented in SAFETY_INVARIANTS.md.
- Contributions are welcome through focused issues and pull requests.
Contributing
We welcome contributions, especially around:
- Better Rust code analysis (tree-sitter queries, syn visitors)
- New candidate generation strategies
- Support for additional agent frameworks
- Evaluation harnesses and scoring functions
See CONTRIBUTING.md for development workflow and review expectations.
License
MIT
The machine that improves the machines — in Rust.