brainwires-agents 0.8.0

Agent orchestration, coordination, and lifecycle management for the Brainwires Agent Framework
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
# MDAP — Multi-Dimensional Adaptive Planning

> This module was previously the standalone `brainwires-mdap` crate. It now lives in `brainwires-agents` behind the `mdap` feature flag.

MAKER voting framework — microagents, decomposition, red flags, and scaling for the Brainwires Agent Framework.

## Paper

This crate is a Rust implementation of **MAKER** (Multi-Agent K-consensus Error correction) as described in:

> **Solving a Million-Step LLM Task with Zero Errors**
> Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Olivier Francon, Conor F. Hayes, Xin Qiu, Babak Hodjat, Risto Miikkulainen
> arXiv:2511.09030, November 2025
> https://arxiv.org/abs/2511.09030

The paper introduces **Massively Decomposed Agentic Processes (MDAPs)** — a scaling approach that decomposes tasks into minimal subtasks handled by focused microagents, with multi-agent voting for error correction at every step. MAKER achieves zero-error execution on million-step tasks through extreme decomposition, first-to-ahead-by-k voting, and red-flag output validation.

This implementation also integrates techniques from three supplementary papers:

- **RASC** ([arXiv:2408.17017]https://arxiv.org/abs/2408.17017) — early stopping with variance tracking and loss-of-hope detection
- **CISC** ([arXiv:2502.06233v1]https://arxiv.org/abs/2502.06233v1) — confidence-weighted voting with dynamic confidence extraction
- **Ranked Voting** ([arXiv:2505.10772]https://arxiv.org/abs/2505.10772) — Borda count ranking as an alternative consensus method

## Overview

This module provides a complete implementation of the MAKER framework organized around five core components that map directly to the paper's algorithms and equations:

- **Voting** (Algorithm 2) — first-to-ahead-by-k consensus with three voting methods, early stopping, and confidence weighting
- **Microagents** (MAD) — focused single-step agents (m=1) that execute one subtask with minimal context
- **Red Flags** (Algorithm 3) — output validation that catches self-correction, confused reasoning, truncation, and format violations
- **Decomposition** (Algorithm 4) — binary recursive task decomposition with AI-driven splitting and dependency resolution
- **Scaling Laws** (Equations 13–19) — cost and probability estimation for choosing optimal k given a budget or reliability target

**Design principles:**

- **Paper-faithful** — algorithms, equations, and heuristics follow the MAKER paper directly
- **Composable** — each component is independent; use voting without decomposition, red flags without microagents, etc.
- **Provider-agnostic** — generic over `MicroagentProvider` trait; works with any LLM backend
- **Intent-based tool use** — microagents express tool *intent* (deterministic) rather than executing tools (non-deterministic), preserving voting correctness
- **Full observability** — per-subtask metrics, voting round tracking, red-flag breakdowns, and cost analysis

```text
  ┌──────────────────────────────────────────────────────────────────────┐
  │                    brainwires-agents::mdap                            │
  │                                                                      │
  │  Task ──► Decomposition (Alg.4) ──► Subtask DAG                      │
  │                                         │                            │
  │               ┌─────────────────────────┘                            │
  │               ▼                                                      │
  │  ┌─── Per Subtask ──────────────────────────────────────────────┐    │
  │  │                                                              │    │
  │  │  Microagent ──► Sample k responses ──► Red Flags (Alg.3)     │    │
  │  │  (m=1 steps)        │                      │                 │    │
  │  │                     ▼                      ▼                 │    │
  │  │              Valid responses ──► Voting (Alg.2)              │    │
  │  │                                      │                       │    │
  │  │              ┌───────────────────────┘                       │    │
  │  │              ▼                                               │    │
  │  │  Winner + VoteResult + SubtaskMetric                         │    │
  │  └──────────────────────────────────────────────────────────────┘    │
  │               │                                                      │
  │               ▼                                                      │
  │  Composer ──► Final result        Scaling (Eq.13-19) ──► Estimates   │
  │                                   Metrics ──► Cost/performance data  │
  └──────────────────────────────────────────────────────────────────────┘
```

## Quick Start

Add to your `Cargo.toml`:

```toml
[dependencies]
brainwires-agents = { version = "0.8", features = ["mdap"] }
```

Run a simple voting consensus:

```rust
use brainwires_mdap::{
    FirstToAheadByKVoter, SampledResponse, ResponseMetadata,
    StandardRedFlagValidator, RedFlagValidator,
};

// Create a voter with k=3, max 20 samples
let voter = FirstToAheadByKVoter::new(3, 20);

// Red-flag validator
let validator = StandardRedFlagValidator::strict();

// Vote with a sampler function that queries an LLM
let result = voter.vote(
    || async {
        let response = call_my_llm("What is 2+2?").await?;
        Ok(SampledResponse::new(
            response.text.clone(),
            ResponseMetadata {
                token_count: response.tokens,
                response_time_ms: response.latency,
                format_valid: true,
                finish_reason: Some("stop".into()),
                model: Some("claude-sonnet".into()),
            },
            response.text,
        ))
    },
    &validator,
).await?;

println!("Winner: {} (votes: {}/{})", result.winner, result.winner_votes, result.total_votes);
```

Estimate cost before running:

```rust
use brainwires_mdap::{estimate_mdap, ModelCosts};

let estimate = estimate_mdap(
    10,                           // num_steps (subtasks)
    0.85,                         // per-step success probability
    0.99,                         // target overall success rate
    &ModelCosts::claude_sonnet(), // model pricing
    500,                          // avg input tokens per call
    200,                          // avg output tokens per call
)?;

println!("Recommended k={}, cost=${:.4}, P(success)={:.4}",
    estimate.recommended_k,
    estimate.expected_cost_usd,
    estimate.success_probability,
);
```

## Architecture

### Voting (Algorithm 2)

The core consensus mechanism. Multiple independent LLM samples are collected and the first answer to lead by k votes wins.

#### `FirstToAheadByKVoter`

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `k` | `usize` || Votes-ahead margin required to declare a winner |
| `max_samples` | `usize` || Maximum samples before giving up |
| `parallel_limit` | `usize` | `k` | Max concurrent samples |
| `batch_size` | `usize` | `k` | Samples per batch |
| `early_stopping` | `EarlyStoppingConfig` | disabled | RASC-style early stopping |
| `voting_method` | `VotingMethod` | `FirstToAheadByK` | Consensus algorithm |
| `use_confidence_weights` | `bool` | `false` | CISC confidence weighting |

**Constructors:**

| Method | Description |
|--------|-------------|
| `new(k, max_samples)` | Standard first-to-ahead-by-k voting |
| `with_early_stopping(k, max_samples, config)` | With RASC early stopping |
| `with_confidence_weighting(k, max_samples)` | CISC confidence-weighted voting |
| `with_borda_count(k, max_samples)` | Ranked voting via Borda count |

**Methods:**

| Method | Description |
|--------|-------------|
| `vote(sampler, validator)` | Execute voting — samples via `sampler`, validates via `validator`, returns `VoteResult` |
| `vote_simple(sampler)` | Simplified voting without red-flag validation |

**`VoterBuilder`** — fluent builder: `VoterBuilder::new().k(3).max_samples(20).voting_method(BordaCount).build()`

#### `VotingMethod`

| Variant | Paper | Description |
|---------|-------|-------------|
| `FirstToAheadByK` | MAKER Alg. 2 | First answer to lead by k votes wins (default) |
| `BordaCount` | [arXiv:2505.10772]https://arxiv.org/abs/2505.10772 | Ranked voting based on confidence scores |
| `ConfidenceWeighted` | [arXiv:2502.06233v1]https://arxiv.org/abs/2502.06233v1 | Votes weighted by response confidence |

#### `EarlyStoppingConfig`

RASC-style early stopping ([arXiv:2408.17017](https://arxiv.org/abs/2408.17017)) to reduce unnecessary samples when consensus is already clear.

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `min_confidence` | `f64` || Confidence threshold to stop early |
| `min_votes` | `usize` || Minimum votes before early stopping is eligible |
| `enabled` | `bool` | `true` | Master toggle |
| `max_variance_threshold` | `f64` | `0.1` | Maximum vote distribution variance to trigger stop |
| `loss_of_hope_enabled` | `bool` | `true` | Stop if no candidate can possibly win |
| `min_weighted_confidence` | `f64` | `0.0` | Minimum weighted confidence for CISC |

**Presets:** `aggressive()` (stop fast), `conservative()` (higher confidence required), `disabled()`

#### `VoteResult<T>`

| Field | Type | Description |
|-------|------|-------------|
| `winner` | `T` | The winning answer |
| `winner_votes` | `usize` | Votes for the winner |
| `total_votes` | `usize` | Total valid votes cast |
| `total_samples` | `usize` | Total samples including red-flagged |
| `red_flagged_count` | `usize` | Samples that failed red-flag validation |
| `vote_distribution` | `HashMap<String, usize>` | Vote counts per unique answer |
| `confidence` | `f64` | Voting confidence score |
| `red_flag_reasons` | `Vec<String>` | Reasons for red-flagged responses |
| `early_stopped` | `bool` | Whether early stopping triggered |
| `weighted_confidence` | `Option<f64>` | CISC weighted confidence |
| `voting_method` | `VotingMethod` | Which method was used |

#### `SampledResponse<T>`

| Field | Type | Description |
|-------|------|-------------|
| `value` | `T` | The parsed/hashed response value |
| `metadata` | `ResponseMetadata` | Token count, timing, format validity, finish reason, model |
| `raw_response` | `String` | Original LLM response text |
| `confidence` | `f64` | Response confidence (for CISC weighting) |

### Microagents (MAD)

Maximal Agentic Decomposition — each microagent executes exactly one subtask (m=1 step) with a focused system prompt that discourages hedging, self-correction, and unnecessary explanation.

#### `Subtask`

| Field | Type | Description |
|-------|------|-------------|
| `id` | `String` | Unique subtask identifier |
| `description` | `String` | What to do |
| `input_state` | `Value` | Input data from prior subtasks |
| `expected_output_format` | `Option<OutputFormat>` | Expected output format for red-flag validation |
| `depends_on` | `Vec<String>` | Subtask IDs that must complete first |
| `complexity_estimate` | `f32` | Estimated difficulty (0.0–1.0) |
| `instructions` | `Option<String>` | Additional instructions |

**Constructors:** `atomic(id, description)`, `new(id, description, input_state)`

#### `MicroagentConfig`

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `max_output_tokens` | `usize` | `750` | Token limit (per paper recommendation) |
| `temperature` | `f32` | `0.1` | Low temperature for consistency |
| `system_prompt_template` | `Option<String>` | Paper default | Custom system prompt |
| `red_flag_config` | `RedFlagConfig` | strict | Red-flag validation settings |
| `timeout_ms` | `Option<u64>` | `None` | Execution timeout |

#### `MicroagentProvider` trait

```rust
#[async_trait]
pub trait MicroagentProvider: Send + Sync {
    async fn chat(
        &self,
        system_prompt: &str,
        user_prompt: &str,
        max_tokens: usize,
        temperature: f32,
    ) -> MdapResult<MicroagentResponse>;

    fn available_tools(&self) -> Vec<ToolSchema> { vec![] }
    fn has_tools(&self) -> bool { false }
}
```

#### `Microagent<P>`

| Method | Description |
|--------|-------------|
| `new(subtask, provider, config)` | Create a microagent for a specific subtask |
| `with_defaults(subtask, provider)` | Create with default config |
| `execute_once()` | Single LLM call — returns `SampledResponse` for voting |
| `execute_with_voting(voter, validator)` | Execute with voting consensus — returns `VoteResult` |

#### Confidence Extraction

`extract_response_confidence(text, metadata)` computes a 0.1–0.99 confidence score by analyzing:

- Finish reason (`stop` vs `length`)
- Response length relative to token limit
- Hedging language ("maybe", "perhaps", "I think")
- Self-correction patterns ("Wait,", "Actually,", "Let me reconsider")
- Confident assertions ("definitely", "clearly", "the answer is")
- Format validity

### Red Flags (Algorithm 3)

Output validation that catches unreliable LLM responses before they enter voting.

#### `RedFlagConfig`

| Field | Type | Default (strict) | Description |
|-------|------|-------------------|-------------|
| `max_response_tokens` | `usize` | `750` | Maximum response length |
| `require_exact_format` | `bool` | `true` | Enforce expected output format |
| `flag_self_correction` | `bool` | `true` | Flag "Wait,", "Actually,", etc. |
| `confusion_patterns` | `Vec<String>` | 10 patterns | Regex patterns indicating confused reasoning |
| `min_response_length` | `usize` | `1` | Minimum response length |
| `max_empty_line_ratio` | `f64` | `0.5` | Maximum empty line ratio |

**Presets:** `strict()` (paper-recommended), `relaxed()` (fewer false positives)

**Strict confusion patterns:** `"Wait,"`, `"Actually,"`, `"Let me reconsider"`, `"I made a mistake"`, `"On second thought"`, `"Hmm,"`, `"I think I"`, `"Let me correct"`, `"Sorry, I meant"`, `"That's not right"`

#### `StandardRedFlagValidator`

Implements Algorithm 3. Validation checks (in order):

1. **Length** — token count vs max, minimum length
2. **Self-correction** — confusion pattern regex matching
3. **Format** — expected output format matching
4. **Truncation** — finish reason analysis
5. **Empty lines** — empty line ratio

#### `RedFlagResult`

| Variant | Description |
|---------|-------------|
| `Valid` | Response passed all checks |
| `Flagged { reason, severity }` | Response failed validation with reason and 0.0–1.0 severity |

#### `OutputFormat`

| Variant | Description |
|---------|-------------|
| `Exact(String)` | Must match exactly |
| `Pattern(String)` | Must match regex |
| `Json` | Must be valid JSON |
| `JsonWithFields(Vec<String>)` | JSON with required fields |
| `Markers { start, end }` | Must contain start/end markers |
| `OneOf(Vec<String>)` | Must be one of the enumerated values |
| `Custom { description, validator_id }` | Custom validation logic |

#### Other Validators

- `AcceptAllValidator` — always returns `Valid` (useful for testing)
- `CompositeValidator` — chains multiple validators; first failure wins

### Decomposition (Algorithm 4)

Breaks complex tasks into a DAG of minimal subtasks.

#### `DecomposeContext`

| Field | Type | Description |
|-------|------|-------------|
| `working_directory` | `Option<String>` | Working directory for file operations |
| `available_tools` | `Vec<ToolSchema>` | Tools available to microagents |
| `max_depth` | `usize` | Maximum recursion depth |
| `current_depth` | `usize` | Current recursion depth |
| `additional_context` | `Option<String>` | Extra context for the decomposer |

#### `DecompositionStrategy`

| Variant | Description |
|---------|-------------|
| `BinaryRecursive { max_depth }` | Paper's approach — AI-driven binary splitting (Algorithm 4) |
| `Simple { max_depth }` | Text-based splitting (testing only) |
| `Sequential` | Linear step extraction |
| `CodeOperations` | Code-specific decomposition |
| `AIDriven { discriminator_k }` | AI splitting with discriminator voting |
| `None` | Atomic — no decomposition |

#### `DecompositionResult`

| Field | Type | Description |
|-------|------|-------------|
| `subtasks` | `Vec<Subtask>` | Ordered subtask list |
| `composition_function` | `CompositionFunction` | How to combine results |
| `is_minimal` | `bool` | Whether the task was already minimal |
| `total_complexity` | `f32` | Sum of subtask complexities |

#### `TaskDecomposer` trait

```rust
#[async_trait]
pub trait TaskDecomposer: Send + Sync {
    async fn decompose(&self, task: &str, context: &DecomposeContext)
        -> MdapResult<DecompositionResult>;
    fn is_minimal(&self, task: &str) -> bool;
    fn strategy(&self) -> DecompositionStrategy;
}
```

#### `BinaryRecursiveDecomposer<P>`

AI-driven implementation of Algorithm 4. Uses the LLM with voting (k consensus) to decide how to split each task, recursing until subtasks are minimal.

**Minimal task heuristics:** very short (< 50 chars), single-action verbs (`return`, `calculate`, `get`, `set`, `check`, etc.), no multi-step conjunctions.

#### `SequentialDecomposer`

Non-AI decomposer that extracts numbered steps or splits by sentences. Useful for pre-structured tasks.

#### Utilities

- `validate_decomposition(result)` — checks non-empty, valid dependencies, no circular references
- `topological_sort(subtasks)` — Kahn's algorithm for dependency ordering

### Composer

Combines subtask outputs into a final result.

#### `CompositionFunction`

| Variant | Description |
|---------|-------------|
| `Identity` | Return single result as-is |
| `Concatenate` | Join as strings |
| `Sequence` | Collect into JSON array |
| `ObjectMerge` | Merge into JSON object |
| `LastOnly` | Take the last result |
| `Custom(String)` | Custom handler by name |
| `Reduce { operation }` | Reduce: `sum`, `multiply`, `max`, `min`, `and`, `or`, `concat` |

#### `Composer`

| Method | Description |
|--------|-------------|
| `new()` | Create an empty composer |
| `register_handler(name, handler)` | Register a custom `CompositionHandler` |
| `compose(function, outputs)` | Compose subtask outputs using the given function |

#### `CompositionBuilder`

Fluent builder with input validation: `CompositionBuilder::new(function).add_result(output).compose()`

### Tool Intent

Microagents express tool *intent* without executing — this keeps voting deterministic since tool execution has side effects.

#### `ToolSchema`

| Field | Type | Description |
|-------|------|-------------|
| `name` | `String` | Tool name |
| `description` | `String` | What the tool does |
| `parameters` | `HashMap<String, String>` | Parameter names and descriptions |
| `required` | `Vec<String>` | Required parameters |
| `category` | `Option<ToolCategory>` | Tool classification |

Converts from `brainwires_core::Tool` via `From` trait.

#### `ToolIntent`

| Field | Type | Description |
|-------|------|-------------|
| `tool_name` | `String` | Which tool to call |
| `arguments` | `Value` | Tool arguments as JSON |
| `rationale` | `Option<String>` | Why this tool is needed |

#### `ToolCategory`

| Variant | Side Effects | Description |
|---------|-------------|-------------|
| `FileRead` | No | Read files |
| `FileWrite` | Yes | Write/edit files |
| `Search` | No | File/text search |
| `SemanticSearch` | No | Embedding-based search |
| `Bash` | Yes | Shell commands |
| `Git` | Yes | Git operations |
| `Web` | No | HTTP requests |
| `AgentPool` | Yes | Agent management |
| `TaskManager` | Yes | Task management |
| `Mcp` | Yes | MCP server tools |
| `Custom(String)` || Custom category |

`read_only_categories()` returns categories safe for microagents. `side_effect_categories()` returns categories that modify state.

#### Intent Parsing

`parse_tool_intent(response)` extracts `ToolIntent` from LLM responses containing `tool_intent` JSON blocks. Returns `IntentParseResult::NoIntent`, `WithIntent`, or `ParseError`.

### Scaling Laws (Equations 13–19)

Cost and probability estimation from the paper's mathematical framework.

#### `estimate_mdap()`

Main estimation function implementing Equations 13–19:

```rust
pub fn estimate_mdap(
    num_steps: usize,         // s: number of subtasks
    per_step_success: f64,    // p: per-step success probability (must be > 0.5)
    target_success: f64,      // target overall success rate (0, 1)
    model_costs: &ModelCosts, // pricing per 1K tokens
    avg_input_tokens: usize,  // average input tokens per call
    avg_output_tokens: usize, // average output tokens per call
) -> MdapResult<MdapEstimate>
```

#### Key Equations

| Function | Equation | Formula |
|----------|----------|---------|
| `calculate_p_full(p, k, s)` | Eq. 13 | `P_full = (1 + ((1-p)/p)^k)^(-s)` |
| `calculate_k_min(p, s, target)` | Eq. 14 | `k_min = ceil(ln(t^(-1/s) - 1) / ln((1-p)/p))` |
| `calculate_expected_votes(p, k)` || `E[votes] ≈ k / (2p - 1)` |
| `calculate_expected_cost(...)` | Eq. 19 | `E[cost] ≈ c·s·k / (v·(2p-1))` |

#### `ModelCosts`

| Preset | Input/1K | Output/1K |
|--------|----------|-----------|
| `claude_sonnet()` | $0.003 | $0.015 |
| `claude_haiku()` | $0.00025 | $0.00125 |
| `gpt4o()` | $0.0025 | $0.01 |
| `gpt4o_mini()` | $0.00015 | $0.0006 |

#### `MdapEstimate`

| Field | Type | Description |
|-------|------|-------------|
| `expected_cost_usd` | `f64` | Estimated total cost |
| `expected_api_calls` | `usize` | Estimated total API calls |
| `success_probability` | `f64` | Overall success probability |
| `recommended_k` | `usize` | Minimum k for target success rate |
| `estimated_time_seconds` | `f64` | Estimated wall-clock time |
| `per_step_success` | `f64` | Per-step success probability used |
| `num_steps` | `usize` | Number of subtasks |

#### `suggest_k_for_budget()`

Budget-constrained k selection — finds the largest k affordable within a dollar budget.

### Metrics

Full observability into MDAP execution.

#### `MdapMetrics`

Comprehensive metrics covering execution, steps, sampling, voting, cost, time, and success rate.

| Method | Description |
|--------|-------------|
| `new(execution_id)` | Create new metrics tracker |
| `with_config(config_summary)` | Attach configuration snapshot |
| `start()` | Record start time |
| `finalize(success)` | Record end time and success |
| `record_subtask(metric)` | Record per-subtask metrics |
| `record_voting_round(metric)` | Record per-round metrics |
| `add_sample_cost(input_tokens, output_tokens, cost)` | Accumulate cost |
| `summary()` | Human-readable summary string |
| `red_flag_analysis()` | Red-flag breakdown string |
| `to_json()` / `from_json()` | Serialization |

#### `SubtaskMetric`

| Field | Type | Description |
|-------|------|-------------|
| `subtask_id` | `String` | Which subtask |
| `description` | `String` | Subtask description |
| `samples_needed` | `usize` | Samples taken to reach consensus |
| `red_flags_hit` | `usize` | Red-flagged samples |
| `red_flag_reasons` | `Vec<String>` | Why samples were flagged |
| `final_confidence` | `f64` | Voting confidence |
| `execution_time_ms` | `u64` | Wall-clock time |
| `winner_votes` / `total_votes` | `usize` | Vote counts |
| `succeeded` | `bool` | Whether the subtask succeeded |
| `input_tokens` / `output_tokens` | `usize` | Token usage |
| `complexity_estimate` | `f32` | Subtask complexity |

### Error Handling

`MdapError` is a comprehensive error enum with sub-error types for each component:

| Variant | Sub-errors | Description |
|---------|-----------|-------------|
| `Voting(VotingError)` | MaxSamplesExceeded, AllSamplesRedFlagged, InvalidK, etc. | Voting failures |
| `RedFlag(RedFlagError)` | ResponseTooLong, SelfCorrectionDetected, InvalidJson, etc. | Validation failures |
| `Decomposition(DecompositionError)` | MaxDepthExceeded, CircularDependency, etc. | Decomposition failures |
| `Microagent(MicroagentError)` | ExecutionFailed, Timeout, ContextTooLarge, etc. | Execution failures |
| `Composition(CompositionError)` | MissingResult, IncompatibleTypes, etc. | Composition failures |
| `Scaling(ScalingError)` | InvalidSuccessProbability, VotingCannotConverge, etc. | Estimation failures |
| `Config(MdapConfigError)` | InvalidK, InvalidTargetSuccessRate, etc. | Configuration errors |
| `ToolRecursionLimit` || Tool intent recursion exceeded |
| `ToolExecutionFailed` || Tool execution failure |
| `ToolNotAllowed` || Tool not permitted for microagent |

**Helper methods:** `is_retryable()`, `is_user_error()`, `is_tool_error()`, `is_red_flag()`, `should_restart_voting()`

## Usage Examples

### Voting with red-flag validation

```rust
use brainwires_mdap::prelude::*;

let voter = FirstToAheadByKVoter::new(3, 20);
let validator = StandardRedFlagValidator::strict();

let result = voter.vote(
    || async { sample_llm_response().await },
    &validator,
).await?;

println!("Winner: {}", result.winner);
println!("Confidence: {:.2}", result.confidence);
println!("Red-flagged: {}", result.red_flagged_count);
```

### Voting with early stopping (RASC)

```rust
use brainwires_mdap::{FirstToAheadByKVoter, EarlyStoppingConfig};

let voter = FirstToAheadByKVoter::with_early_stopping(
    3,
    20,
    EarlyStoppingConfig::aggressive(),
);

let result = voter.vote(sampler, &validator).await?;
if result.early_stopped {
    println!("Stopped early at {} samples", result.total_samples);
}
```

### Confidence-weighted voting (CISC)

```rust
use brainwires_mdap::FirstToAheadByKVoter;

let voter = FirstToAheadByKVoter::with_confidence_weighting(3, 20);

let result = voter.vote(sampler, &validator).await?;
println!("Weighted confidence: {:?}", result.weighted_confidence);
```

### Builder pattern for voter configuration

```rust
use brainwires_mdap::{VoterBuilder, VotingMethod, EarlyStoppingConfig};

let voter = VoterBuilder::new()
    .k(5)
    .max_samples(30)
    .voting_method(VotingMethod::BordaCount)
    .early_stopping(EarlyStoppingConfig::conservative())
    .parallel_limit(4)
    .build()?;
```

### Cost estimation before execution

```rust
use brainwires_mdap::{estimate_mdap, ModelCosts, suggest_k_for_budget};

// What k do I need for 99% success on 10 steps?
let estimate = estimate_mdap(10, 0.85, 0.99, &ModelCosts::claude_sonnet(), 500, 200)?;
println!("Need k={}, cost=${:.4}", estimate.recommended_k, estimate.expected_cost_usd);

// What k can I afford with $0.50?
let k = suggest_k_for_budget(10, 0.85, &ModelCosts::claude_haiku(), 500, 200, 0.50)?;
println!("Budget allows k={}", k);
```

### Task decomposition

```rust
use brainwires_mdap::{
    decomposition::{BinaryRecursiveDecomposer, DecomposeContext, validate_decomposition, topological_sort},
};

let decomposer = BinaryRecursiveDecomposer::new(provider, 4, 3, 15);
let context = DecomposeContext::new().with_max_depth(4);

let result = decomposer.decompose("Implement an LRU cache with get/put", &context).await?;
validate_decomposition(&result)?;

let ordered = topological_sort(&result.subtasks)?;
for subtask in &ordered {
    println!("{}: {}", subtask.id, subtask.description);
}
```

### Composing subtask results

```rust
use brainwires_mdap::{Composer, CompositionFunction, SubtaskOutput};

let composer = Composer::new();

let outputs = vec![
    SubtaskOutput::new("step-1", serde_json::json!("struct definition")),
    SubtaskOutput::new("step-2", serde_json::json!("impl block")),
    SubtaskOutput::new("step-3", serde_json::json!("tests")),
];

let final_result = composer.compose(&CompositionFunction::Concatenate, &outputs)?;
```

### Tracking metrics

```rust
use brainwires_mdap::{MdapMetrics, SubtaskMetric, ConfigSummary};

let mut metrics = MdapMetrics::new("exec-001".into());
metrics.start();

metrics.record_subtask(SubtaskMetric {
    subtask_id: "step-1".into(),
    description: "Parse input".into(),
    samples_needed: 5,
    red_flags_hit: 1,
    final_confidence: 0.95,
    succeeded: true,
    // ... other fields
    ..Default::default()
});

metrics.finalize(true);
println!("{}", metrics.summary());
println!("{}", metrics.red_flag_analysis());
```

## Integration

Use via the `brainwires` facade crate with the `mdap` feature, or depend on `brainwires-agents` directly:

```toml
# Via facade
[dependencies]
brainwires = { version = "0.8", features = ["mdap"] }

# Direct
[dependencies]
brainwires-agents = { version = "0.8", features = ["mdap"] }
```

The `prelude` module re-exports the most commonly used types:

```rust
use brainwires_mdap::prelude::*;
```

## License

Licensed under the MIT License. See [LICENSE](../../LICENSE) for details.