mdx-rust

A Rust-native optimizer for LLM agents.

Point mdx-rust at an existing Rust agent (or crate), give it a behavioral policy, and let it safely improve prompts and simple single-file fallback behavior through structured experimentation - with compile-time safety gates at every step.

An early, safety-first optimizer for Rust LLM agents, with compile-time validation and Rust-aware analysis as its core differentiators.

Why mdx-rust?

Native Rust understanding — Uses syn + tree-sitter-rust to actually read and reason about your agent’s code instead of treating it as text.
Safety first — Every accepted change is validated with cargo check and clippy inside isolated git worktrees or copies, then revalidated on the real tree with rollback on failure.
Policy-driven — Your domain rules, constraints, and quality expectations (policies.md) guide diagnosis, candidate generation, and scoring — not just “maximize the metric.”
Agent-friendly CLI — Excellent human output by default + first-class --json mode so other coding agents can drive it.
Observable — First-class trace records plus experiment ledgers, hook decisions, validation command records, score deltas, and provenance fields for accepted changes.
Disciplined lifecycle — Built-in hook stages for pre-edit, pre-command, post-validation, and pre-accept decisions.
Audit aware — Deterministic static audit checks surface risky agent surfaces like process execution, secret literals, unsafe code, and MCP/A2A-style integration boundaries.
Single binary path — install from crates.io with cargo install mdx-rust, or from this repo with cargo install --git https://github.com/dhotherm/mdx-rust --package mdx-rust.

Quick Start

Install the CLI:

cargo install mdx-rust

Clone the repo and try the built-in example (the fastest way to see mdx-rust in action):

git clone https://github.com/dhotherm/mdx-rust
cd mdx-rust

# The example rig-minimal-agent starts in a deliberately weak state
cargo run -p mdx-rust -- init
cargo run -p mdx-rust -- register example examples/rig-minimal-agent
cargo run -p mdx-rust -- optimize example --iterations 2
cargo run -p mdx-rust -- audit example

# See the improvement
cargo run -p mdx-rust -- invoke example --input '{"query":"What is 9 + 10?"}'

You should see the optimizer detect the weak echo behavior, plan a targeted fallback edit, validate it safely, and accept it only if the score improves.

Full flow for your own agent:

cd your-rust-agent
/path/to/mdx-rust init
/path/to/mdx-rust register my-agent
/path/to/mdx-rust optimize my-agent --iterations 5 --review

Artifacts (traces, diagnoses, candidates, reports, diffs) live under .mdx-rust/agents/<name>/.

How It Works

Register — Detects entrypoint (Rig agent, async fn, or generic JSON contract), creates a thin harness if needed, and smoke-tests invocation.
Spec — Analyzes your agent and produces starter policies.md, eval_spec.json, and dataset.json artifacts.
Optimize — Runs the agent on a deterministic dataset with lifecycle traces → scores outputs → diagnoses failures → generates targeted candidate patches → validates safely → keeps only net-positive changes with regression guards.
Gate — Every candidate moves through explicit lifecycle stages: pre-edit, isolated validation, patched evaluation, pre-accept hook, final validation, and rollback on failure.
Repeat — Multiple iterations, budgeted candidate pools, holdout splits, and experiment ledgers.

Key Commands

mdx-rust init
mdx-rust register my-agent ./path/to/agent
mdx-rust spec my-agent
mdx-rust doctor my-agent
mdx-rust audit my-agent
mdx-rust eval my-agent --dataset .mdx-rust/agents/my-agent/dataset.json
mdx-rust optimize my-agent --iterations 3 --budget medium --review

Every command that matters supports --json for coding agents and automation.

Safety Model

The optimizer is built around a conservative lifecycle:

Analyze source scope with Rust-aware finders.
Run the agent on a train split and record trace diagnoses.
Generate typed candidate strategies.
Build a targeted edit only when a safe planner exists.
Run built-in lifecycle hooks.
Apply and validate in an isolated workspace.
Score the patched workspace.
Land only net-positive changes.
Run final validation on the real tree.
Roll back if final validation fails.

Experiment records include dataset version/hash, scorer version, git SHAs, validation commands, score deltas, hook decisions, holdout score, and prompt variant ledger entries.

The full acceptance contract is documented in SAFETY_INVARIANTS.md.

Current Scope

v1 is deliberately conservative:

Accepted edits are single-file only.
Current edit strategies cover prompts and common echo-style fallback behavior.
eval can load and hash datasets, but scored standalone evaluation is still being built.
Native Rust contracts currently execute through the process harness; richer in-process harnesses are future work.
Published crates are available for mdx-rust, mdx-rust-core, and mdx-rust-analysis. See RELEASE.md.

Status

Early public beta (May 2026).

mdx-rust can already:

Register Rig and generic agents
Run them with tracing
Perform deep Rust analysis (prompts, tools, entrypoints)
Run deterministic diagnosis with structured candidates
Safely propose, validate (cargo check + clippy in isolation), and accept improvements
Execute typed strategies for prompts and common single-file fallback behavior
Split evaluation data into train/holdout sets under light, medium, or heavy budgets
Record prompt variant ledgers and lifecycle hook decisions
Run deterministic static security audits
Support dry-review mode (--review)
Produce experiment reports and artifacts

The built-in example demonstrates a real before/after optimization win.

See ROADMAP.md for current scope and upcoming work.

Project Health

Published on crates.io as mdx-rust.
CI runs format, clippy, test, release build, crate packaging, and install smoke checks.
Security and acceptance invariants are documented in SAFETY_INVARIANTS.md.
Contributions are welcome through focused issues and pull requests.

Contributing

We welcome contributions, especially around:

Better Rust code analysis (tree-sitter queries, syn visitors)
New candidate generation strategies
Support for additional agent frameworks
Evaluation harnesses and scoring functions

See CONTRIBUTING.md for development workflow and review expectations.

License

MIT

The machine that improves the machines — in Rust.

mdx-rust-core 0.1.1