task-graph-mcp 0.2.1

MCP server for agent task workflows with phases, prompts, gates, and multi-agent coordination
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
# Task Graph MCP - Experiment Metrics


> **Version:** 1.0  
> **Last Updated:** 2026-01-28  
> **Status:** Design Document

This document defines the metrics to capture for multi-agent experiment analysis. These metrics enable evaluation of agent coordination efficiency, resource utilization, and workflow optimization.

---

## Table of Contents


- [Overview]#overview
- [Time Metrics]#time-metrics
  - [Wall-Clock Time]#wall-clock-time
  - [Time Blocked vs Working]#time-blocked-vs-working
- [Token Metrics]#token-metrics
- [Task Distribution Metrics]#task-distribution-metrics
- [Quality Metrics]#quality-metrics
  - [Rework Rate]#rework-rate
- [Throughput Metrics]#throughput-metrics
- [Coordination Overhead]#coordination-overhead
- [Data Collection]#data-collection
- [Visualization Approaches]#visualization-approaches
- [Analysis Use Cases]#analysis-use-cases

---

## Overview


Metrics fall into six categories:

| Category | Purpose |
|----------|---------|
| **Time** | Measure duration and efficiency of work |
| **Tokens** | Track LLM resource consumption |
| **Distribution** | Analyze workload balance across agents |
| **Quality** | Assess rework and error rates |
| **Throughput** | Measure task completion velocity |
| **Coordination** | Quantify multi-agent overhead |

---

## Time Metrics


### Wall-Clock Time


Wall-clock time measures elapsed real-world time at various granularities.

#### Total Experiment Time


**Definition:** Elapsed time from first task creation to last task completion.

**Data Collection:**
```sql
SELECT 
  MIN(created_at) AS experiment_start,
  MAX(completed_at) AS experiment_end,
  (MAX(completed_at) - MIN(created_at)) AS total_duration_ms
FROM tasks
WHERE status = 'completed';
```

**Use Cases:**
- Compare overall experiment duration across different configurations
- Establish baseline for single-agent vs multi-agent comparisons
- Calculate experiment cost (duration * agent count * rate)

---

#### Per-Task Time


**Definition:** Time spent actively working on each task (from `started_at` to `completed_at` or accumulated in `time_actual_ms`).

**Data Collection:**
- **Estimated:** `time_estimate_ms` (set at task creation)
- **Actual:** `time_actual_ms` (accumulated automatically from timed states)
- **Timestamps:** `started_at`, `completed_at`, `claimed_at`

```sql
SELECT 
  id,
  title,
  time_estimate_ms,
  time_actual_ms,
  (completed_at - started_at) AS elapsed_ms,
  CASE 
    WHEN time_estimate_ms > 0 
    THEN (time_actual_ms * 100.0 / time_estimate_ms)
    ELSE NULL 
  END AS estimate_accuracy_pct
FROM tasks
WHERE status = 'completed';
```

**Use Cases:**
- Identify tasks that exceed estimates (planning improvement)
- Find task types that are consistently fast/slow
- Train estimation models based on task characteristics

---

#### Per-Phase Time


**Definition:** Aggregate time spent in each workflow phase across all tasks.

**Data Collection:**
Phase transitions are recorded in `task_state_sequence` with the `event` field indicating state changes. To track phases, use task tags or a dedicated phase field.

```sql
-- Using tags to identify phases
SELECT 
  json_extract(tags, '$[0]') AS phase,
  COUNT(*) AS task_count,
  SUM(time_actual_ms) AS total_time_ms,
  AVG(time_actual_ms) AS avg_time_ms
FROM tasks
WHERE status = 'completed'
GROUP BY json_extract(tags, '$[0]');
```

**Use Cases:**
- Identify bottleneck phases in the workflow
- Balance agent allocation across phases
- Optimize phase-specific tooling or prompts

---

### Time Blocked vs Working


**Definition:** Ratio of time tasks spend waiting (blocked by dependencies or unclaimed) versus actively being worked on.

**Data Collection:**
From `task_state_sequence`, calculate time in each state:

```sql
-- Time in each state per task
SELECT 
  task_id,
  event AS state,
  SUM(COALESCE(end_timestamp, strftime('%s','now')*1000) - timestamp) AS duration_ms
FROM task_state_sequence
GROUP BY task_id, event;

-- Aggregate blocked vs working
SELECT 
  CASE 
    WHEN event IN ('pending', 'assigned') THEN 'blocked'
    WHEN event = 'working' THEN 'working'
    ELSE 'other'
  END AS category,
  SUM(COALESCE(end_timestamp, strftime('%s','now')*1000) - timestamp) AS total_ms
FROM task_state_sequence
GROUP BY category;
```

**Metrics Derived:**
- **Blocked Time:** Sum of time in `pending` + `assigned` states
- **Working Time:** Sum of time in `working` state
- **Blocking Ratio:** `blocked_time / (blocked_time + working_time)`
- **Queue Wait Time:** Time from task creation to first claim

**Use Cases:**
- Identify dependency bottlenecks (high blocked time)
- Detect under-provisioned agent pools (long queue wait)
- Optimize task ordering to minimize blocking

---

## Token Metrics


### Token Consumption


**Definition:** LLM tokens consumed, categorized by type and attribution.

**Data Collection:**
The `tasks` table tracks tokens at the task level:

| Column | Description |
|--------|-------------|
| `tokens_in` | Input tokens (prompt) |
| `tokens_cached` | Cached/reused tokens |
| `tokens_out` | Output tokens (completion) |
| `tokens_thinking` | Reasoning/chain-of-thought tokens |
| `tokens_image` | Image processing tokens |
| `tokens_audio` | Audio processing tokens |

```sql
-- Total tokens by type
SELECT 
  SUM(tokens_in) AS total_input,
  SUM(tokens_cached) AS total_cached,
  SUM(tokens_out) AS total_output,
  SUM(tokens_thinking) AS total_thinking,
  SUM(tokens_in + tokens_out + tokens_thinking) AS total_billable
FROM tasks;

-- Tokens per agent
SELECT 
  worker_id,
  COUNT(*) AS tasks_completed,
  SUM(tokens_in) AS input_tokens,
  SUM(tokens_out) AS output_tokens,
  SUM(cost_usd) AS total_cost
FROM tasks
WHERE status = 'completed' AND worker_id IS NOT NULL
GROUP BY worker_id;
```

**Metrics Derived:**
- **Cache Hit Rate:** `tokens_cached / (tokens_in + tokens_cached)`
- **Output Ratio:** `tokens_out / tokens_in`
- **Thinking Overhead:** `tokens_thinking / tokens_out`
- **Cost per Task:** `cost_usd / task_count`
- **Cost per Token:** `cost_usd / total_tokens`

**Use Cases:**
- Compare prompt efficiency across different agent configurations
- Identify tasks with unexpectedly high token usage
- Optimize caching strategies (higher cache hit = lower cost)
- Budget forecasting for large experiments

---

### Per-Agent Token Analysis


**Definition:** Token usage attributed to individual agents to identify efficiency differences.

**Data Collection:**
```sql
SELECT 
  worker_id,
  COUNT(*) AS task_count,
  SUM(tokens_in) AS tokens_in,
  SUM(tokens_out) AS tokens_out,
  AVG(tokens_in) AS avg_input_per_task,
  AVG(tokens_out) AS avg_output_per_task,
  SUM(cost_usd) AS total_cost,
  AVG(cost_usd) AS avg_cost_per_task
FROM tasks
WHERE status = 'completed'
GROUP BY worker_id
ORDER BY total_cost DESC;
```

**Use Cases:**
- Identify verbose vs concise agents
- Detect agents that may be stuck in loops (high token usage, low completion)
- Balance workload to optimize total cost

---

## Task Distribution Metrics


### Tasks per Agent


**Definition:** Number of tasks claimed, completed, and failed by each agent.

**Data Collection:**
```sql
SELECT 
  worker_id,
  COUNT(*) AS total_claimed,
  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed,
  SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) AS failed,
  SUM(CASE WHEN status = 'working' THEN 1 ELSE 0 END) AS in_progress,
  AVG(time_actual_ms) AS avg_time_per_task
FROM tasks
WHERE worker_id IS NOT NULL
GROUP BY worker_id;
```

**Metrics Derived:**
- **Completion Rate:** `completed / total_claimed`
- **Failure Rate:** `failed / total_claimed`
- **Load Balance Score:** Standard deviation of task counts across agents
- **Gini Coefficient:** Inequality measure for task distribution

**Use Cases:**
- Detect overloaded or underutilized agents
- Identify agents with high failure rates (may need different task types)
- Tune claiming strategies for better balance

---

### Task Type Distribution


**Definition:** Distribution of task types/phases across agents.

**Data Collection:**
```sql
-- Using tags as task type indicator
SELECT 
  worker_id,
  json_extract(tags, '$[0]') AS task_type,
  COUNT(*) AS count
FROM tasks
WHERE worker_id IS NOT NULL
GROUP BY worker_id, json_extract(tags, '$[0]');
```

**Use Cases:**
- Verify agents are claiming tasks matching their capabilities
- Identify specialization patterns
- Optimize `needed_tags`/`wanted_tags` requirements

---

## Quality Metrics


### Rework Rate


**Definition:** Percentage of tasks that were reopened after being marked complete, indicating quality issues.

**Data Collection:**
Track state transitions in `task_state_sequence`:

```sql
-- Tasks with multiple working periods
SELECT 
  task_id,
  COUNT(*) AS working_periods
FROM task_state_sequence
WHERE event = 'working'
GROUP BY task_id
HAVING COUNT(*) > 1;

-- Rework rate
WITH rework AS (
  SELECT 
    task_id,
    COUNT(*) AS working_periods
  FROM task_state_sequence
  WHERE event = 'working'
  GROUP BY task_id
)
SELECT 
  COUNT(CASE WHEN working_periods > 1 THEN 1 END) AS reworked_tasks,
  COUNT(*) AS total_tasks,
  (COUNT(CASE WHEN working_periods > 1 THEN 1 END) * 100.0 / COUNT(*)) AS rework_rate_pct
FROM rework;
```

**Metrics Derived:**
- **Rework Rate:** Tasks with >1 working period / total tasks
- **Rework Cycles:** Average number of working periods for reworked tasks
- **Rework Time:** Additional time spent on reworked tasks
- **First-Pass Success Rate:** Tasks completed in single working period

**Use Cases:**
- Identify systemic quality issues in task definitions
- Compare agent accuracy (high rework = potential skill mismatch)
- Measure impact of review processes

---

### Failed Task Analysis


**Definition:** Analysis of tasks that ended in `failed` state.

**Data Collection:**
```sql
SELECT 
  t.id,
  t.title,
  t.worker_id,
  t.time_actual_ms,
  tss.reason AS failure_reason
FROM tasks t
LEFT JOIN task_state_sequence tss ON t.id = tss.task_id AND tss.event = 'failed'
WHERE t.status = 'failed';
```

**Use Cases:**
- Categorize failure reasons
- Identify patterns (certain task types, agents, or times)
- Improve task definitions or agent capabilities

---

## Throughput Metrics


### Tasks per Hour


**Definition:** Rate of task completion over time.

**Data Collection:**
```sql
-- Hourly throughput
SELECT 
  datetime(completed_at/1000, 'unixepoch', 'start of hour') AS hour,
  COUNT(*) AS tasks_completed
FROM tasks
WHERE status = 'completed'
GROUP BY hour
ORDER BY hour;

-- Overall throughput
SELECT 
  COUNT(*) AS total_completed,
  (MAX(completed_at) - MIN(started_at)) / 3600000.0 AS duration_hours,
  COUNT(*) / ((MAX(completed_at) - MIN(started_at)) / 3600000.0) AS tasks_per_hour
FROM tasks
WHERE status = 'completed';
```

**Metrics Derived:**
- **Instantaneous Throughput:** Tasks completed in rolling window
- **Sustained Throughput:** Tasks/hour over full experiment
- **Peak Throughput:** Maximum hourly completion rate
- **Throughput per Agent:** tasks_per_hour / agent_count

**Use Cases:**
- Measure scaling efficiency (throughput vs agent count)
- Identify throughput degradation over time
- Compare workflow configurations

---

### Points per Hour


**Definition:** Story points or complexity units completed per hour (for weighted throughput).

**Data Collection:**
```sql
SELECT 
  SUM(points) AS total_points,
  (MAX(completed_at) - MIN(started_at)) / 3600000.0 AS duration_hours,
  SUM(points) / ((MAX(completed_at) - MIN(started_at)) / 3600000.0) AS points_per_hour
FROM tasks
WHERE status = 'completed' AND points IS NOT NULL;
```

**Use Cases:**
- Account for task complexity in throughput calculations
- Compare experiments with different task mixes
- Velocity tracking for sprint planning

---

## Coordination Overhead


**Definition:** Time and resources spent on coordination rather than direct task work.

### Components


| Component | Description | Measurement |
|-----------|-------------|-------------|
| **Claim Contention** | Time spent retrying failed claims | Count of claim attempts vs successes |
| **File Lock Waiting** | Time blocked waiting for file access | Time between mark attempt and success |
| **Dependency Blocking** | Time waiting for blockers to complete | Time in `pending` state due to deps |
| **Communication** | Token usage for coordination messages | Tokens in `thinking` calls |
| **Heartbeat Traffic** | Resources for liveness checks | Count of heartbeat updates |

### Data Collection


```sql
-- Claim contention (from state transitions)
SELECT 
  agent_id,
  COUNT(*) AS claim_attempts,
  SUM(CASE WHEN event = 'working' THEN 1 ELSE 0 END) AS successful_claims
FROM task_state_sequence
WHERE event IN ('assigned', 'working')
GROUP BY agent_id;

-- File coordination overhead
SELECT 
  COUNT(*) AS total_marks,
  COUNT(CASE WHEN event = 'released' THEN 1 END) AS releases,
  AVG(CASE 
    WHEN event = 'released' AND claim_id IS NOT NULL 
    THEN timestamp - (SELECT timestamp FROM claim_sequence c2 WHERE c2.id = claim_sequence.claim_id)
    ELSE NULL 
  END) AS avg_hold_time_ms
FROM claim_sequence;

-- Dependency blocking time
WITH blocked_time AS (
  SELECT 
    task_id,
    SUM(COALESCE(end_timestamp, strftime('%s','now')*1000) - timestamp) AS pending_ms
  FROM task_state_sequence
  WHERE event = 'pending'
  GROUP BY task_id
)
SELECT 
  AVG(pending_ms) AS avg_blocked_ms,
  SUM(pending_ms) AS total_blocked_ms
FROM blocked_time;
```

### Overhead Ratio


**Definition:** Proportion of total time spent on coordination vs productive work.

```sql
WITH times AS (
  SELECT 
    SUM(CASE WHEN event = 'working' THEN 
      COALESCE(end_timestamp, strftime('%s','now')*1000) - timestamp 
    ELSE 0 END) AS working_ms,
    SUM(CASE WHEN event IN ('pending', 'assigned') THEN 
      COALESCE(end_timestamp, strftime('%s','now')*1000) - timestamp 
    ELSE 0 END) AS waiting_ms
  FROM task_state_sequence
)
SELECT 
  working_ms,
  waiting_ms,
  (waiting_ms * 100.0 / (working_ms + waiting_ms)) AS overhead_pct
FROM times;
```

**Use Cases:**
- Identify coordination bottlenecks
- Compare pull vs push task assignment efficiency
- Optimize dependency structures to reduce blocking

---

## Data Collection


### Automatic Collection


The following data is collected automatically by the task-graph system:

| Data | Storage | Trigger |
|------|---------|---------|
| Task timestamps | `tasks` table | CRUD operations |
| State transitions | `task_state_sequence` | Status changes |
| Time actual | `tasks.time_actual_ms` | Exit from timed state |
| File marks | `claim_sequence` | mark/unmark operations |
| Worker heartbeats | `workers.last_heartbeat` | Any tool call |

### Manual Collection


Some metrics require explicit logging:

| Data | How to Collect |
|------|----------------|
| Token counts | Call `log_metrics` with token values |
| Cost USD | Call `log_metrics` with cost |
| Custom metrics | Use `metrics` array (8 integer slots) |

```javascript
// Example: Log tokens after task completion
log_metrics({
  agent: "worker-1",
  task: "task-123",
  values: [1500, 500, 0, 0, 0, 0, 0, 0],  // [in, out, cached, thinking, ...]
  cost_usd: 0.0023
});
```

### Export for Analysis


Use the export functionality to extract data for external analysis:

```bash
# Export full database snapshot

task-graph export --format json --output experiment-data.json

# Export specific tables

task-graph query --sql "SELECT * FROM tasks" --format csv > tasks.csv
task-graph query --sql "SELECT * FROM task_state_sequence" --format csv > transitions.csv
```

---

## Visualization Approaches


### Time Metrics


| Metric | Visualization | Tool Suggestions |
|--------|---------------|------------------|
| Total duration | Single value card | Any dashboard |
| Per-task time | Histogram, box plot | Matplotlib, Plotly |
| Phase time | Stacked bar chart | D3.js, Tableau |
| Blocked vs working | Pie chart, timeline | Gantt chart tools |

### Token Metrics


| Metric | Visualization | Tool Suggestions |
|--------|---------------|------------------|
| Token breakdown | Stacked area chart | Plotly |
| Per-agent tokens | Grouped bar chart | Matplotlib |
| Cost over time | Line chart | Grafana |
| Cache hit rate | Gauge | Dashboard widgets |

### Distribution Metrics


| Metric | Visualization | Tool Suggestions |
|--------|---------------|------------------|
| Tasks per agent | Bar chart | Any |
| Load balance | Heatmap | Seaborn |
| Task flow | Sankey diagram | D3.js |
| Agent activity | Timeline/Gantt | Custom |

### Quality Metrics


| Metric | Visualization | Tool Suggestions |
|--------|---------------|------------------|
| Rework rate | Trend line | Time series |
| Failure analysis | Treemap by category | D3.js |
| State transitions | State diagram | Custom |

### Throughput Metrics


| Metric | Visualization | Tool Suggestions |
|--------|---------------|------------------|
| Tasks/hour | Line chart | Any |
| Cumulative completion | Area chart | Plotly |
| Scaling efficiency | Tasks/agent scatter | Matplotlib |

### Coordination Overhead


| Metric | Visualization | Tool Suggestions |
|--------|---------------|------------------|
| Overhead ratio | Pie/donut chart | Any |
| File contention | Heatmap by file | Custom |
| Dependency graph | Network diagram | D3.js, vis.js |
| Blocking cascade | Critical path diagram | MS Project style |

---

## Analysis Use Cases


### Experiment Comparison


Compare different configurations (agent count, task structure, coordination model):

```sql
-- Assuming experiment_id stored in task tags
SELECT 
  json_extract(tags, '$.experiment') AS experiment,
  COUNT(*) AS tasks,
  SUM(time_actual_ms) / 1000.0 AS total_seconds,
  SUM(cost_usd) AS total_cost,
  COUNT(*) / (SUM(time_actual_ms) / 3600000.0) AS tasks_per_hour
FROM tasks
WHERE status = 'completed'
GROUP BY json_extract(tags, '$.experiment');
```

### Bottleneck Identification


Find tasks or phases that slow down the overall system:

```sql
-- Tasks with longest blocked time
SELECT 
  t.id,
  t.title,
  SUM(tss.end_timestamp - tss.timestamp) AS blocked_ms
FROM tasks t
JOIN task_state_sequence tss ON t.id = tss.task_id
WHERE tss.event = 'pending' AND tss.end_timestamp IS NOT NULL
GROUP BY t.id
ORDER BY blocked_ms DESC
LIMIT 10;
```

### Agent Performance Ranking


Evaluate agent efficiency across multiple dimensions:

```sql
SELECT 
  worker_id,
  COUNT(*) AS tasks_completed,
  AVG(time_actual_ms) AS avg_time_ms,
  SUM(cost_usd) / COUNT(*) AS cost_per_task,
  SUM(CASE WHEN points IS NOT NULL THEN points ELSE 1 END) / 
    (SUM(time_actual_ms) / 3600000.0) AS velocity
FROM tasks
WHERE status = 'completed' AND worker_id IS NOT NULL
GROUP BY worker_id
ORDER BY velocity DESC;
```

### Cost Optimization


Identify opportunities to reduce experiment cost:

```sql
-- High-cost tasks
SELECT 
  id,
  title,
  cost_usd,
  tokens_in,
  tokens_out,
  (tokens_in + tokens_out) / cost_usd AS tokens_per_dollar
FROM tasks
WHERE cost_usd > 0
ORDER BY cost_usd DESC
LIMIT 20;

-- Cost by phase
SELECT 
  json_extract(tags, '$[0]') AS phase,
  SUM(cost_usd) AS total_cost,
  AVG(cost_usd) AS avg_cost,
  COUNT(*) AS task_count
FROM tasks
WHERE status = 'completed'
GROUP BY phase
ORDER BY total_cost DESC;
```

---

## Appendix: Metric Summary Table


| Metric | Formula | Unit | Target Direction |
|--------|---------|------|------------------|
| Total Duration | `max(completed_at) - min(created_at)` | ms | Lower |
| Avg Task Time | `mean(time_actual_ms)` | ms | Lower |
| Blocking Ratio | `blocked_time / total_time` | % | Lower |
| Token Efficiency | `output_tokens / input_tokens` | ratio | Higher |
| Cache Hit Rate | `cached / (input + cached)` | % | Higher |
| Task Balance | `stddev(tasks_per_agent)` | count | Lower |
| Rework Rate | `reworked_tasks / total_tasks` | % | Lower |
| Throughput | `completed_tasks / duration_hours` | tasks/hr | Higher |
| Coordination Overhead | `waiting_time / total_time` | % | Lower |
| Cost per Task | `total_cost / task_count` | USD | Lower |

---

## Document History


| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-01-28 | Initial metrics definition |

---

*This document defines the experiment metrics framework. Implementation details for automated collection and reporting will be added as the experiment system is built.*