MDAP — Multi-Dimensional Adaptive Planning
This module was previously the standalone
brainwires-mdapcrate. It now lives inbrainwires-agentbehind themdapfeature flag.
MAKER voting framework — microagents, decomposition, red flags, and scaling for the Brainwires Agent Framework.
Paper
This crate is a Rust implementation of MAKER (Multi-Agent K-consensus Error correction) as described in:
Solving a Million-Step LLM Task with Zero Errors Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Olivier Francon, Conor F. Hayes, Xin Qiu, Babak Hodjat, Risto Miikkulainen arXiv:2511.09030, November 2025 https://arxiv.org/abs/2511.09030
The paper introduces Massively Decomposed Agentic Processes (MDAPs) — a scaling approach that decomposes tasks into minimal subtasks handled by focused microagents, with multi-agent voting for error correction at every step. MAKER achieves zero-error execution on million-step tasks through extreme decomposition, first-to-ahead-by-k voting, and red-flag output validation.
This implementation also integrates techniques from three supplementary papers:
- RASC (arXiv:2408.17017) — early stopping with variance tracking and loss-of-hope detection
- CISC (arXiv:2502.06233v1) — confidence-weighted voting with dynamic confidence extraction
- Ranked Voting (arXiv:2505.10772) — Borda count ranking as an alternative consensus method
Overview
This module provides a complete implementation of the MAKER framework organized around five core components that map directly to the paper's algorithms and equations:
- Voting (Algorithm 2) — first-to-ahead-by-k consensus with three voting methods, early stopping, and confidence weighting
- Microagents (MAD) — focused single-step agents (m=1) that execute one subtask with minimal context
- Red Flags (Algorithm 3) — output validation that catches self-correction, confused reasoning, truncation, and format violations
- Decomposition (Algorithm 4) — binary recursive task decomposition with AI-driven splitting and dependency resolution
- Scaling Laws (Equations 13–19) — cost and probability estimation for choosing optimal k given a budget or reliability target
Design principles:
- Paper-faithful — algorithms, equations, and heuristics follow the MAKER paper directly
- Composable — each component is independent; use voting without decomposition, red flags without microagents, etc.
- Provider-agnostic — generic over
MicroagentProvidertrait; works with any LLM backend - Intent-based tool use — microagents express tool intent (deterministic) rather than executing tools (non-deterministic), preserving voting correctness
- Full observability — per-subtask metrics, voting round tracking, red-flag breakdowns, and cost analysis
┌──────────────────────────────────────────────────────────────────────┐
│ brainwires-agent::mdap │
│ │
│ Task ──► Decomposition (Alg.4) ──► Subtask DAG │
│ │ │
│ ┌─────────────────────────┘ │
│ ▼ │
│ ┌─── Per Subtask ──────────────────────────────────────────────┐ │
│ │ │ │
│ │ Microagent ──► Sample k responses ──► Red Flags (Alg.3) │ │
│ │ (m=1 steps) │ │ │ │
│ │ ▼ ▼ │ │
│ │ Valid responses ──► Voting (Alg.2) │ │
│ │ │ │ │
│ │ ┌───────────────────────┘ │ │
│ │ ▼ │ │
│ │ Winner + VoteResult + SubtaskMetric │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Composer ──► Final result Scaling (Eq.13-19) ──► Estimates │
│ Metrics ──► Cost/performance data │
└──────────────────────────────────────────────────────────────────────┘
Quick Start
Add to your Cargo.toml:
[]
= "0.11"
Run a simple voting consensus:
use ;
// Create a voter with k=3, max 20 samples
let voter = new;
// Red-flag validator
let validator = strict;
// Vote with a sampler function that queries an LLM
let result = voter.vote.await?;
println!;
Estimate cost before running:
use ;
let estimate = estimate_mdap?;
println!;
Architecture
Voting (Algorithm 2)
The core consensus mechanism. Multiple independent LLM samples are collected and the first answer to lead by k votes wins.
FirstToAheadByKVoter
| Field | Type | Default | Description |
|---|---|---|---|
k |
usize |
— | Votes-ahead margin required to declare a winner |
max_samples |
usize |
— | Maximum samples before giving up |
parallel_limit |
usize |
k |
Max concurrent samples |
batch_size |
usize |
k |
Samples per batch |
early_stopping |
EarlyStoppingConfig |
disabled | RASC-style early stopping |
voting_method |
VotingMethod |
FirstToAheadByK |
Consensus algorithm |
use_confidence_weights |
bool |
false |
CISC confidence weighting |
Constructors:
| Method | Description |
|---|---|
new(k, max_samples) |
Standard first-to-ahead-by-k voting |
with_early_stopping(k, max_samples, config) |
With RASC early stopping |
with_confidence_weighting(k, max_samples) |
CISC confidence-weighted voting |
with_borda_count(k, max_samples) |
Ranked voting via Borda count |
Methods:
| Method | Description |
|---|---|
vote(sampler, validator) |
Execute voting — samples via sampler, validates via validator, returns VoteResult |
vote_simple(sampler) |
Simplified voting without red-flag validation |
VoterBuilder — fluent builder: VoterBuilder::new().k(3).max_samples(20).voting_method(BordaCount).build()
VotingMethod
| Variant | Paper | Description |
|---|---|---|
FirstToAheadByK |
MAKER Alg. 2 | First answer to lead by k votes wins (default) |
BordaCount |
arXiv:2505.10772 | Ranked voting based on confidence scores |
ConfidenceWeighted |
arXiv:2502.06233v1 | Votes weighted by response confidence |
EarlyStoppingConfig
RASC-style early stopping (arXiv:2408.17017) to reduce unnecessary samples when consensus is already clear.
| Field | Type | Default | Description |
|---|---|---|---|
min_confidence |
f64 |
— | Confidence threshold to stop early |
min_votes |
usize |
— | Minimum votes before early stopping is eligible |
enabled |
bool |
true |
Master toggle |
max_variance_threshold |
f64 |
0.1 |
Maximum vote distribution variance to trigger stop |
loss_of_hope_enabled |
bool |
true |
Stop if no candidate can possibly win |
min_weighted_confidence |
f64 |
0.0 |
Minimum weighted confidence for CISC |
Presets: aggressive() (stop fast), conservative() (higher confidence required), disabled()
VoteResult<T>
| Field | Type | Description |
|---|---|---|
winner |
T |
The winning answer |
winner_votes |
usize |
Votes for the winner |
total_votes |
usize |
Total valid votes cast |
total_samples |
usize |
Total samples including red-flagged |
red_flagged_count |
usize |
Samples that failed red-flag validation |
vote_distribution |
HashMap<String, usize> |
Vote counts per unique answer |
confidence |
f64 |
Voting confidence score |
red_flag_reasons |
Vec<String> |
Reasons for red-flagged responses |
early_stopped |
bool |
Whether early stopping triggered |
weighted_confidence |
Option<f64> |
CISC weighted confidence |
voting_method |
VotingMethod |
Which method was used |
SampledResponse<T>
| Field | Type | Description |
|---|---|---|
value |
T |
The parsed/hashed response value |
metadata |
ResponseMetadata |
Token count, timing, format validity, finish reason, model |
raw_response |
String |
Original LLM response text |
confidence |
f64 |
Response confidence (for CISC weighting) |
Microagents (MAD)
Maximal Agentic Decomposition — each microagent executes exactly one subtask (m=1 step) with a focused system prompt that discourages hedging, self-correction, and unnecessary explanation.
Subtask
| Field | Type | Description |
|---|---|---|
id |
String |
Unique subtask identifier |
description |
String |
What to do |
input_state |
Value |
Input data from prior subtasks |
expected_output_format |
Option<OutputFormat> |
Expected output format for red-flag validation |
depends_on |
Vec<String> |
Subtask IDs that must complete first |
complexity_estimate |
f32 |
Estimated difficulty (0.0–1.0) |
instructions |
Option<String> |
Additional instructions |
Constructors: atomic(id, description), new(id, description, input_state)
MicroagentConfig
| Field | Type | Default | Description |
|---|---|---|---|
max_output_tokens |
usize |
750 |
Token limit (per paper recommendation) |
temperature |
f32 |
0.1 |
Low temperature for consistency |
system_prompt_template |
Option<String> |
Paper default | Custom system prompt |
red_flag_config |
RedFlagConfig |
strict | Red-flag validation settings |
timeout_ms |
Option<u64> |
None |
Execution timeout |
MicroagentProvider trait
Microagent<P>
| Method | Description |
|---|---|
new(subtask, provider, config) |
Create a microagent for a specific subtask |
with_defaults(subtask, provider) |
Create with default config |
execute_once() |
Single LLM call — returns SampledResponse for voting |
execute_with_voting(voter, validator) |
Execute with voting consensus — returns VoteResult |
Confidence Extraction
extract_response_confidence(text, metadata) computes a 0.1–0.99 confidence score by analyzing:
- Finish reason (
stopvslength) - Response length relative to token limit
- Hedging language ("maybe", "perhaps", "I think")
- Self-correction patterns ("Wait,", "Actually,", "Let me reconsider")
- Confident assertions ("definitely", "clearly", "the answer is")
- Format validity
Red Flags (Algorithm 3)
Output validation that catches unreliable LLM responses before they enter voting.
RedFlagConfig
| Field | Type | Default (strict) | Description |
|---|---|---|---|
max_response_tokens |
usize |
750 |
Maximum response length |
require_exact_format |
bool |
true |
Enforce expected output format |
flag_self_correction |
bool |
true |
Flag "Wait,", "Actually,", etc. |
confusion_patterns |
Vec<String> |
10 patterns | Regex patterns indicating confused reasoning |
min_response_length |
usize |
1 |
Minimum response length |
max_empty_line_ratio |
f64 |
0.5 |
Maximum empty line ratio |
Presets: strict() (paper-recommended), relaxed() (fewer false positives)
Strict confusion patterns: "Wait,", "Actually,", "Let me reconsider", "I made a mistake", "On second thought", "Hmm,", "I think I", "Let me correct", "Sorry, I meant", "That's not right"
StandardRedFlagValidator
Implements Algorithm 3. Validation checks (in order):
- Length — token count vs max, minimum length
- Self-correction — confusion pattern regex matching
- Format — expected output format matching
- Truncation — finish reason analysis
- Empty lines — empty line ratio
RedFlagResult
| Variant | Description |
|---|---|
Valid |
Response passed all checks |
Flagged { reason, severity } |
Response failed validation with reason and 0.0–1.0 severity |
OutputFormat
| Variant | Description |
|---|---|
Exact(String) |
Must match exactly |
Pattern(String) |
Must match regex |
Json |
Must be valid JSON |
JsonWithFields(Vec<String>) |
JSON with required fields |
Markers { start, end } |
Must contain start/end markers |
OneOf(Vec<String>) |
Must be one of the enumerated values |
Custom { description, validator_id } |
Custom validation logic |
Other Validators
AcceptAllValidator— always returnsValid(useful for testing)CompositeValidator— chains multiple validators; first failure wins
Decomposition (Algorithm 4)
Breaks complex tasks into a DAG of minimal subtasks.
DecomposeContext
| Field | Type | Description |
|---|---|---|
working_directory |
Option<String> |
Working directory for file operations |
available_tools |
Vec<ToolSchema> |
Tools available to microagents |
max_depth |
usize |
Maximum recursion depth |
current_depth |
usize |
Current recursion depth |
additional_context |
Option<String> |
Extra context for the decomposer |
DecompositionStrategy
| Variant | Description |
|---|---|
BinaryRecursive { max_depth } |
Paper's approach — AI-driven binary splitting (Algorithm 4) |
Simple { max_depth } |
Text-based splitting (testing only) |
Sequential |
Linear step extraction |
CodeOperations |
Code-specific decomposition |
AIDriven { discriminator_k } |
AI splitting with discriminator voting |
None |
Atomic — no decomposition |
DecompositionResult
| Field | Type | Description |
|---|---|---|
subtasks |
Vec<Subtask> |
Ordered subtask list |
composition_function |
CompositionFunction |
How to combine results |
is_minimal |
bool |
Whether the task was already minimal |
total_complexity |
f32 |
Sum of subtask complexities |
TaskDecomposer trait
BinaryRecursiveDecomposer<P>
AI-driven implementation of Algorithm 4. Uses the LLM with voting (k consensus) to decide how to split each task, recursing until subtasks are minimal.
Minimal task heuristics: very short (< 50 chars), single-action verbs (return, calculate, get, set, check, etc.), no multi-step conjunctions.
SequentialDecomposer
Non-AI decomposer that extracts numbered steps or splits by sentences. Useful for pre-structured tasks.
Utilities
validate_decomposition(result)— checks non-empty, valid dependencies, no circular referencestopological_sort(subtasks)— Kahn's algorithm for dependency ordering
Composer
Combines subtask outputs into a final result.
CompositionFunction
| Variant | Description |
|---|---|
Identity |
Return single result as-is |
Concatenate |
Join as strings |
Sequence |
Collect into JSON array |
ObjectMerge |
Merge into JSON object |
LastOnly |
Take the last result |
Custom(String) |
Custom handler by name |
Reduce { operation } |
Reduce: sum, multiply, max, min, and, or, concat |
Composer
| Method | Description |
|---|---|
new() |
Create an empty composer |
register_handler(name, handler) |
Register a custom CompositionHandler |
compose(function, outputs) |
Compose subtask outputs using the given function |
CompositionBuilder
Fluent builder with input validation: CompositionBuilder::new(function).add_result(output).compose()
Tool Intent
Microagents express tool intent without executing — this keeps voting deterministic since tool execution has side effects.
ToolSchema
| Field | Type | Description |
|---|---|---|
name |
String |
Tool name |
description |
String |
What the tool does |
parameters |
HashMap<String, String> |
Parameter names and descriptions |
required |
Vec<String> |
Required parameters |
category |
Option<ToolCategory> |
Tool classification |
Converts from brainwires_core::Tool via From trait.
ToolIntent
| Field | Type | Description |
|---|---|---|
tool_name |
String |
Which tool to call |
arguments |
Value |
Tool arguments as JSON |
rationale |
Option<String> |
Why this tool is needed |
ToolCategory
| Variant | Side Effects | Description |
|---|---|---|
FileRead |
No | Read files |
FileWrite |
Yes | Write/edit files |
Search |
No | File/text search |
SemanticSearch |
No | Embedding-based search |
Bash |
Yes | Shell commands |
Git |
Yes | Git operations |
Web |
No | HTTP requests |
AgentPool |
Yes | Agent management |
TaskManager |
Yes | Task management |
Mcp |
Yes | MCP server tools |
Custom(String) |
— | Custom category |
read_only_categories() returns categories safe for microagents. side_effect_categories() returns categories that modify state.
Intent Parsing
parse_tool_intent(response) extracts ToolIntent from LLM responses containing tool_intent JSON blocks. Returns IntentParseResult::NoIntent, WithIntent, or ParseError.
Scaling Laws (Equations 13–19)
Cost and probability estimation from the paper's mathematical framework.
estimate_mdap()
Main estimation function implementing Equations 13–19:
Key Equations
| Function | Equation | Formula |
|---|---|---|
calculate_p_full(p, k, s) |
Eq. 13 | P_full = (1 + ((1-p)/p)^k)^(-s) |
calculate_k_min(p, s, target) |
Eq. 14 | k_min = ceil(ln(t^(-1/s) - 1) / ln((1-p)/p)) |
calculate_expected_votes(p, k) |
— | E[votes] ≈ k / (2p - 1) |
calculate_expected_cost(...) |
Eq. 19 | E[cost] ≈ c·s·k / (v·(2p-1)) |
ModelCosts
| Preset | Input/1K | Output/1K |
|---|---|---|
claude_sonnet() |
$0.003 | $0.015 |
claude_haiku() |
$0.00025 | $0.00125 |
gpt4o() |
$0.0025 | $0.01 |
gpt4o_mini() |
$0.00015 | $0.0006 |
MdapEstimate
| Field | Type | Description |
|---|---|---|
expected_cost_usd |
f64 |
Estimated total cost |
expected_api_calls |
usize |
Estimated total API calls |
success_probability |
f64 |
Overall success probability |
recommended_k |
usize |
Minimum k for target success rate |
estimated_time_seconds |
f64 |
Estimated wall-clock time |
per_step_success |
f64 |
Per-step success probability used |
num_steps |
usize |
Number of subtasks |
suggest_k_for_budget()
Budget-constrained k selection — finds the largest k affordable within a dollar budget.
Metrics
Full observability into MDAP execution.
MdapMetrics
Comprehensive metrics covering execution, steps, sampling, voting, cost, time, and success rate.
| Method | Description |
|---|---|
new(execution_id) |
Create new metrics tracker |
with_config(config_summary) |
Attach configuration snapshot |
start() |
Record start time |
finalize(success) |
Record end time and success |
record_subtask(metric) |
Record per-subtask metrics |
record_voting_round(metric) |
Record per-round metrics |
add_sample_cost(input_tokens, output_tokens, cost) |
Accumulate cost |
summary() |
Human-readable summary string |
red_flag_analysis() |
Red-flag breakdown string |
to_json() / from_json() |
Serialization |
SubtaskMetric
| Field | Type | Description |
|---|---|---|
subtask_id |
String |
Which subtask |
description |
String |
Subtask description |
samples_needed |
usize |
Samples taken to reach consensus |
red_flags_hit |
usize |
Red-flagged samples |
red_flag_reasons |
Vec<String> |
Why samples were flagged |
final_confidence |
f64 |
Voting confidence |
execution_time_ms |
u64 |
Wall-clock time |
winner_votes / total_votes |
usize |
Vote counts |
succeeded |
bool |
Whether the subtask succeeded |
input_tokens / output_tokens |
usize |
Token usage |
complexity_estimate |
f32 |
Subtask complexity |
Error Handling
MdapError is a comprehensive error enum with sub-error types for each component:
| Variant | Sub-errors | Description |
|---|---|---|
Voting(VotingError) |
MaxSamplesExceeded, AllSamplesRedFlagged, InvalidK, etc. | Voting failures |
RedFlag(RedFlagError) |
ResponseTooLong, SelfCorrectionDetected, InvalidJson, etc. | Validation failures |
Decomposition(DecompositionError) |
MaxDepthExceeded, CircularDependency, etc. | Decomposition failures |
Microagent(MicroagentError) |
ExecutionFailed, Timeout, ContextTooLarge, etc. | Execution failures |
Composition(CompositionError) |
MissingResult, IncompatibleTypes, etc. | Composition failures |
Scaling(ScalingError) |
InvalidSuccessProbability, VotingCannotConverge, etc. | Estimation failures |
Config(MdapConfigError) |
InvalidK, InvalidTargetSuccessRate, etc. | Configuration errors |
ToolRecursionLimit |
— | Tool intent recursion exceeded |
ToolExecutionFailed |
— | Tool execution failure |
ToolNotAllowed |
— | Tool not permitted for microagent |
Helper methods: is_retryable(), is_user_error(), is_tool_error(), is_red_flag(), should_restart_voting()
Usage Examples
Voting with red-flag validation
use *;
let voter = new;
let validator = strict;
let result = voter.vote.await?;
println!;
println!;
println!;
Voting with early stopping (RASC)
use ;
let voter = with_early_stopping;
let result = voter.vote.await?;
if result.early_stopped
Confidence-weighted voting (CISC)
use FirstToAheadByKVoter;
let voter = with_confidence_weighting;
let result = voter.vote.await?;
println!;
Builder pattern for voter configuration
use ;
let voter = new
.k
.max_samples
.voting_method
.early_stopping
.parallel_limit
.build?;
Cost estimation before execution
use ;
// What k do I need for 99% success on 10 steps?
let estimate = estimate_mdap?;
println!;
// What k can I afford with $0.50?
let k = suggest_k_for_budget?;
println!;
Task decomposition
use ;
let decomposer = new;
let context = new.with_max_depth;
let result = decomposer.decompose.await?;
validate_decomposition?;
let ordered = topological_sort?;
for subtask in &ordered
Composing subtask results
use ;
let composer = new;
let outputs = vec!;
let final_result = composer.compose?;
Tracking metrics
use ;
let mut metrics = new;
metrics.start;
metrics.record_subtask;
metrics.finalize;
println!;
println!;
Integration
Use via the brainwires facade crate with the mdap feature, or depend on brainwires-agent directly:
# Via facade
[]
= { = "0.11", = ["mdap"] }
# Direct
[]
= "0.11"
The prelude module re-exports the most commonly used types:
use *;
License
Licensed under the MIT License. See LICENSE for details.