tinyagents 0.1.0

A Rust LLM orchestration library inspired by LangChain and LangGraph.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
# RustAgents System Specification

RustAgents is a Rust-native LLM application framework inspired by LangChain,
LangGraph, and CodeAct-style recursive language-model runtimes. The system is
organized around five modules:

1. the harness
2. the graph
3. the registry
4. the expressive language
5. the REPL language

The goal is to make agent systems easy to define, inspect, run, test, and
eventually serialize without hiding the Rust types that make production systems
reliable.

## Reference Positioning

RustAgents should synthesize the reference systems rather than clone any one of
them:

- LangGraph contributes the durable execution model: explicit state graphs,
  virtual `START` and `END`, Pregel-style supersteps, reducers/channels,
  commands, `Send` fanout, checkpointing, interrupts, subgraphs, streaming, and
  time travel.
- LangChain contributes the harness model: provider-neutral models, tools,
  middleware, runtime context, memory, retrieval, structured output, tracing,
  usage, cost, and conformance tests for integrations.
- `rust-langgraph` shows the Rust-facing precedent for a stateful graph runtime
  with nodes, conditional edges, checkpoints, streaming, optional model
  adapters, and ReAct/tool helpers. RustAgents should go deeper on typed state,
  harness composition, registries, and language-backed graph definitions.
- OpenHuman PR #4261 contributes the closest product-shaped precedent: a
  harness-decoupled graph engine, persistent checkpoints, HITL, graph
  observability, blueprints, JSON-RPC run control, and a behavior-preserving
  cutover from an implicit turn loop to an explicit phase machine.
- RLM contributes the REPL/code-act model: context and prompts as runtime
  values, recursive sub-model or sub-agent calls as functions, persistent
  session variables, trajectory logging, and sandbox choices.

The target architecture is therefore layered: the harness owns model/tool
execution and policies, the graph owns deterministic state transition and
durability, the registry owns named capabilities, `.rag` owns serializable graph
blueprints, and `.ragsh` owns capability-bound interactive orchestration. No
layer should bypass another layer's safety, policy, observability, or test
contracts.

## Detailed Module Docs

- [Harness module]../modules/harness/README.md
  - [Context]../modules/harness/context.md
  - [Model and providers]../modules/harness/model.md
  - [Embeddings and retrieval]../modules/harness/embeddings.md
  - [Prompt]../modules/harness/prompt.md
  - [Tool]../modules/harness/tool.md
  - [Middleware]../modules/harness/middleware.md
  - [Sub-agent and orchestrator steering]../modules/harness/subagent-steering.md
  - [Structured output]../modules/harness/structured-output.md
  - [Limits, retry, fallback, and rate limiting]../modules/harness/limits-retry.md
  - [Summarization]../modules/harness/summarization.md
  - [Usage]../modules/harness/usage.md
  - [Cost]../modules/harness/cost.md
  - [Cache]../modules/harness/cache.md
  - [Streaming]../modules/harness/streaming.md
  - [Store]../modules/harness/store.md
  - [Observability and events]../modules/harness/observability.md
  - [Testkit]../modules/harness/testkit.md
- [Graph module]../modules/graph/README.md
  - [Package and core types]../modules/graph/package.md
  - [Builder and compile contract]../modules/graph/builder.md
  - [Node model]../modules/graph/nodes.md
  - [State, channels, and updates]../modules/graph/state-channels.md
  - [Edges, routing, commands, and sends]../modules/graph/routing.md
  - [Execution model and parallelization]../modules/graph/execution.md
  - [Parallel agents and context forking]../modules/graph/parallel-agents-forking.md
  - [Checkpointing, durability, state inspection, and time travel]../modules/graph/checkpointing.md
  - [Interrupts and resume]../modules/graph/interrupts.md
  - [Streaming and events]../modules/graph/streaming.md
  - [Observability and tracing]../modules/graph/observability.md
  - [Runtime context and policies]../modules/graph/runtime-policy.md
  - [Fault tolerance]../modules/graph/fault-tolerance.md
  - [Subgraphs]../modules/graph/subgraphs.md
  - [Sub-agents and recursion]../modules/graph/subagents-recursion.md
  - [Memory and stores boundary]../modules/graph/memory-boundary.md
  - [Visualization, introspection, and testkit]../modules/graph/visualization-testkit.md
  - [Implementation milestones]../modules/graph/milestones.md
- [Registry module]../modules/registry/README.md
  - [Design]../modules/registry/design.md
  - [Model catalog and local snapshots]../modules/registry/model-catalog.md
- [Expressive language module]../modules/expressive-language/README.md
- [REPL language module]../modules/repl-language/README.md
  - [Design]../modules/repl-language/design.md

Docs should follow the module layout. Do not place standalone specification
files directly in `docs/` or `docs/modules/`; each high-level topic should have
its own directory with a `README.md` entrypoint and any supporting files beside
it.

## Design Goals

- Make simple agent workflows concise.
- Make complex workflows explicit, inspectable, and testable.
- Treat graph execution as a first-class runtime, not an incidental callback
  chain.
- Keep model providers, tools, memory, and tracing behind stable traits.
- Support both Rust builder APIs and a compact expressive language for workflow
  definitions.
- Support a capability-bound REPL language for interactive graph and harness
  orchestration.
- Allow agents to author, inspect, compile, and run graph blueprints through the
  same registry-bound compiler path used by human-authored `.rag` files.
- Allow parent orchestrators and humans to steer orchestrator agents and
  sub-agents through typed, policy-checked, observable commands.
- Prefer deterministic state transitions around inherently nondeterministic LLM
  calls.
- Keep every generated or hand-authored graph explainable as topology,
  capabilities, policies, state channels, checkpoints, and events.

## Module 1: Harness

The harness is the outer runtime for LLM applications. In LangChain terms, this
is the layer around a model call that owns the agent loop, prompt/context
assembly, tool execution, middleware, memory, streaming, tracing, retries, and
testability.

The harness must stay composable. It should not be a single monolithic `Agent`
type that hides every behavior. A direct model call, a model-plus-tools loop, and
a graph node that invokes a model should all share the same harness primitives.

### Source Inspiration

The harness design is informed by LangChain's docs on agents, chat models, tools,
runtime context, memory, structured output, middleware, streaming, tracing, and
testing:

- <https://docs.langchain.com/oss/python/langchain/agents>
- <https://docs.langchain.com/oss/python/langchain/models>
- <https://docs.langchain.com/oss/python/langchain/tools>
- <https://docs.langchain.com/oss/python/langchain/runtime>
- <https://docs.langchain.com/oss/python/langchain/short-term-memory>
- <https://docs.langchain.com/oss/python/langchain/structured-output>
- <https://docs.langchain.com/oss/python/langchain/middleware/built-in>
- <https://docs.langchain.com/oss/python/langchain/streaming>
- <https://docs.langchain.com/oss/python/langchain/observability>
- <https://docs.langchain.com/oss/python/langchain/test>

### Responsibilities

- Register chat model providers.
- Resolve model calls from request overrides, reusable state, model hints,
  agent defaults, registry defaults, and fallback policy.
- Register tools and validate tool calls against schemas.
- Build model requests from state, prompts, memory, and runtime context.
- Apply prompt and message templates.
- Preserve provider prompt/KV-cache stability by keeping cacheable prompt
  prefixes deterministic and isolating volatile context near the tail of model
  requests.
- Manage per-run config such as run ids, thread ids, metadata, tags, deadlines,
  max concurrency, model limits, tool limits, and
  cancellation.
- Provide middleware hooks before and after model calls, tool calls, and errors.
- Provide middleware hooks during streaming model calls so compression,
  redaction, observability, and adaptive context algorithms can inspect deltas
  without replacing provider adapters.
- Emit typed events for observability and streaming.
- Write readable run status records for direct model calls, agent loops, and
  graph-node child harness calls.
- Maintain append-only event journals when durable listener replay is
  configured.
- Enforce retry, timeout, model-call, tool-call, and recursion policies.
- Accept sub-agent and orchestrator steering commands from humans, parent
  agents, graph supervisors, middleware, and tests at safe loop boundaries.
- Normalize model and tool errors into framework errors.
- Persist resolved model identity in responses, events, usage/cost records, run
  status, and durable agent or graph state so later calls can reuse it.
- Provide test doubles for models, tools, stores, clocks, and ids.

### Core Types

```rust
pub struct AgentHarness<State, Ctx = ()> {
    models: ModelRegistry<State, Ctx>,
    tools: ToolRegistry<State, Ctx>,
    middleware: MiddlewareStack<State, Ctx>,
    policy: RunPolicy,
}

pub struct RunConfig {
    pub run_id: String,
    pub thread_id: Option<String>,
    pub tags: Vec<String>,
    pub metadata: serde_json::Value,
    pub timeout_ms: Option<u64>,
    pub max_model_calls: usize,
    pub max_tool_calls: usize,
}

pub struct RunContext<Ctx = ()> {
    pub config: RunConfig,
    pub data: Ctx,
    pub stores: StoreRegistry,
    pub events: EventSink,
}
```

`RunConfig` is stable invocation identity and policy. `RunContext` is the
per-run dependency bag. Keeping those separate prevents global state and makes
unit tests straightforward.

### Model Abstraction

Models should be provider-agnostic. The graph layer should never know whether a
node uses OpenAI, Anthropic, Ollama, a local model, or a test fake.

```rust
#[async_trait]
pub trait ChatModel<State>: Send + Sync {
    async fn invoke(
        &self,
        state: &State,
        request: ModelRequest,
    ) -> Result<ModelResponse>;

    async fn stream(
        &self,
        state: &State,
        request: ModelRequest,
    ) -> Result<ModelStream> {
        default_stream_from_invoke(self, state, request).await
    }
}
```

`ModelRequest` should grow beyond the current minimal version:

- model hints and reusable resolved-model policy
- messages
- tools available for this call
- tool choice policy
- response format
- model id/provider override
- temperature
- max tokens
- timeout
- retry policy
- local response cache policy
- provider prompt-cache policy
- cacheable prompt prefix boundaries
- ephemeral/non-cacheable context boundaries
- prompt layout fingerprint
- tags and metadata

`ModelResponse` and agent state should record a `ResolvedModel` with registry
name, provider, provider model id, catalog snapshot/entry when known, resolver
source, and fallback history. This record is the durable answer to which model
actually ran, and it may be reused by later calls when policy allows.

Provider prompt caching is different from local response caching. The harness
must support extreme prompt caching for providers with KV-cache or
prompt-prefix-cache behavior. That means request construction must be able to
mark stable message and tool-schema prefixes, preserve their byte/token order
across turns, and append volatile state, retrieved context, scratchpads, and
per-run metadata after those stable prefixes. Middleware that compresses,
trims, summarizes, or injects context must declare whether it changes the
cacheable prefix, the volatile tail, or only non-model-visible metadata.

The cache contract should prevent accidental KV-cache busting:

- stable system prompts, policy text, tool declarations, schema text, and
  reusable instruction blocks should have explicit prefix segment ids
- volatile values such as timestamps, run ids, retrieved documents, current
  tool results, and user-specific ephemeral context should stay out of the
  cacheable prefix unless a policy explicitly opts in
- request builders should preserve segment order and canonical serialization
- middleware must emit a cache-layout event when it mutates prompt segments
- tests should be able to assert whether a change preserves or invalidates the
  provider prompt-cache prefix

Initial provider implementations should be optional feature flags:

- `openai`
- `anthropic`
- `ollama`
- `mock`

### Message Model

Messages are the internal currency of the harness. The framework should not pass
raw strings after initial user input normalization.

```rust
pub enum Message {
    System(SystemMessage),
    User(UserMessage),
    Assistant(AssistantMessage),
    Tool(ToolMessage),
}

pub enum ContentBlock {
    Text(String),
    Json(serde_json::Value),
    Image(ImageRef),
    ProviderExtension(serde_json::Value),
}
```

The message model should preserve:

- role
- content blocks
- assistant tool calls
- tool call ids
- tool result ids
- usage metadata
- provider extensions

Tool call ids are mandatory once tool execution is implemented because they are
the correlation key between assistant requests and tool messages.

### Tool Abstraction

Tools are typed capabilities exposed to agents. The initial executor can accept
JSON arguments, but the registry should store schema metadata from the start.

```rust
#[async_trait]
pub trait Tool<State>: Send + Sync {
    fn name(&self) -> &str;
    fn description(&self) -> &str;
    fn schema(&self) -> ToolSchema;
    async fn call(&self, state: &State, call: ToolCall) -> Result<ToolResult>;
}
```

Tool calls must be observable and replayable. Each call should record:

- tool name
- arguments
- result content
- raw provider result when available
- elapsed time
- error details

Tool names should be ASCII and `snake_case` by default. This keeps names
portable across providers that are strict about tool naming.

### Agent Loop

The default harness loop should be:

1. Build `RunContext`.
2. Load short-term memory for `thread_id` when configured.
3. Build a `ModelRequest`.
4. Run pre-request middleware that can edit prompts, context, cache layout,
   compression state, and provider options.
5. Run wrap middleware around the invoke or stream call for retry, fallback,
   rate limiting, tracing, and replacement.
6. Run streaming middleware while model deltas arrive, including compression,
   redaction, tool-call reconstruction, usage accounting, and adaptive
   cancellation.
7. Run post-response middleware that can validate, compress, summarize,
   persist, or transform the model response.
8. If the assistant produced tool calls, validate and execute them.
9. Append tool result messages.
10. Repeat until no tool calls remain or limits are reached.
11. Persist updated short-term memory and return the final output.

Limits are not optional. The harness should enforce:

- maximum model calls per run
- maximum tool calls per run
- maximum wall-clock duration
- maximum retries per call
- optional maximum concurrency for parallel tool calls

### Middleware

Middleware is the primary extension point for behavior that should not be baked
into the model or graph APIs.

```rust
#[async_trait]
pub trait Middleware<State, Ctx = ()>: Send + Sync {
    async fn before_agent(&self, ctx: &mut RunContext<Ctx>, state: &State) -> Result<()>;
    async fn after_agent(&self, ctx: &mut RunContext<Ctx>, state: &State, run: &mut AgentRun) -> Result<()>;
    async fn before_model(&self, ctx: &mut RunContext<Ctx>, state: &State, request: &mut ModelRequest) -> Result<()>;
    async fn on_model_delta(&self, ctx: &mut RunContext<Ctx>, state: &State, delta: &mut ModelDelta) -> Result<()>;
    async fn after_model(&self, ctx: &mut RunContext<Ctx>, state: &State, response: &mut ModelResponse) -> Result<()>;
    async fn before_tool(&self, ctx: &mut RunContext<Ctx>, state: &State, call: &mut ToolCall) -> Result<()>;
    async fn on_tool_delta(&self, ctx: &mut RunContext<Ctx>, state: &State, delta: &mut ToolDelta) -> Result<()>;
    async fn after_tool(&self, ctx: &mut RunContext<Ctx>, state: &State, result: &mut ToolResult) -> Result<()>;
    async fn on_error(&self, ctx: &mut RunContext<Ctx>, error: &RustAgentsError) -> Result<()>;
}
```

Wrap middleware should also exist around model calls and tool calls. A
compression algorithm often needs to wrap the entire model operation so it can
prepare context before the call, inspect streaming deltas during the call, and
commit summaries or cache metadata after the final response.

Expected middleware:

- retry and timeout policy
- prompt injection
- prompt cache layout protection
- provider prompt-cache/KV-cache hints
- dynamic tool filtering
- guardrails
- context compression
- transcript compression
- retrieved-context compression
- output compression
- streaming delta compression
- message trimming
- summarization
- structured output validation
- tracing
- rate limiting

### Memory

Memory should be a harness capability. The graph runtime should handle
checkpointed graph execution; the harness should handle conversation and
application memory.

Memory is split into two concepts:

- short-term memory: thread-scoped conversation state, usually backed by graph
  checkpoints or a conversation checkpoint store
- long-term memory: cross-thread application data exposed through a store trait

Memory backends should start with:

- in-memory store for tests
- file-backed store for local development
- trait boundary for external stores

Trimming and summarization should be explicit policies, not hidden behavior.
Compression is a broader middleware family than summarization. The harness
should support pre-call compression of old messages and retrieved context,
during-call compression or redaction of streaming deltas, and post-call
compression of transcripts, tool artifacts, reasoning traces, and memory
records. Compression middleware must preserve provenance: the original source
ids, token estimates, cache segment ids, and enough metadata to explain why a
message was removed, replaced, or summarized.

### Structured Output

The harness should support typed output using two strategies:

- provider-native schema enforcement when the model supports it
- tool-call-based structured output fallback

The user-facing API should allow:

```rust
let output: MyType = harness
    .with_response_format(ResponseFormat::json_schema::<MyType>())
    .invoke(state)
    .await?
    .structured_response()?;
```

The final structured value should be separate from final chat messages so users
can inspect both.

### Observability

Every run should be traceable through typed events and readable through a
compact execution status store. The status store is the answer to "what is this
run doing now?"; the event stream and journal are the answer to "what happened?"

The canonical feature references are:

- [Harness observability and events]../modules/harness/observability.md
- [Harness store]../modules/harness/store.md
- [Harness streaming]../modules/harness/streaming.md
- [Harness cache]../modules/harness/cache.md

At minimum, the harness should emit:

- run started
- model requested
- model token delta
- model responded
- tool requested
- tool token or progress delta
- tool responded
- state update
- middleware started
- middleware completed
- retry scheduled
- route selected
- run completed
- run failed

The event stream should be structured data so it can feed logs,
OpenTelemetry, test recorders, durable JSONL/MongoDB journals, or a custom UI.

```rust
pub enum AgentEvent {
    RunStarted { run_id: String, thread_id: Option<String> },
    ModelStarted { call_id: String, model: String },
    ModelDelta { call_id: String, delta: MessageDelta },
    ModelCompleted { call_id: String, usage: Option<Usage> },
    ToolStarted { call_id: String, tool_name: String },
    ToolCompleted { call_id: String, tool_name: String },
    RetryScheduled { call_id: String, attempt: usize },
    RunCompleted { run_id: String },
    RunFailed { run_id: String, error: String },
}
```

The harness should also expose a compact run-status record:

```rust
pub struct HarnessRunStatus {
    pub run_id: RunId,
    pub parent_run_id: Option<RunId>,
    pub root_run_id: RunId,
    pub thread_id: Option<ThreadId>,
    pub component: ComponentId,
    pub status: ExecutionStatus,
    pub current_phase: HarnessPhase,
    pub model_calls: usize,
    pub tool_calls: usize,
    pub active_model_call: Option<CallId>,
    pub active_tool_calls: Vec<CallId>,
    pub last_event_id: Option<EventId>,
    pub usage: UsageTotals,
    pub cost: CostTotals,
    pub started_at: SystemTime,
    pub updated_at: SystemTime,
    pub ended_at: Option<SystemTime>,
    pub error: Option<HarnessErrorSummary>,
}
```

Status records are operational snapshots. They should not include full prompts,
tool outputs, or raw provider payloads. Event journals are append-only and
should support listener replay by stream offset. Derived observability
projections such as latest status, usage rollups, cost rollups, and timing
summaries may be cached, but every cached projection must include a source event
offset and projection version.

### Testability

The harness should ship a `testkit` module early. It should include:

- fake chat model with scripted responses
- fake streaming model
- fake tool
- in-memory stores
- deterministic run id generator
- deterministic clock
- event recorder
- trajectory assertions that check tool calls and state changes without relying
  on exact LLM prose

## Module 2: Graph

The graph is the workflow runtime. It executes stateful nodes, applies state
updates, follows direct or conditional edges, records execution history, handles
interrupts, and returns a final state.

The first implementation can stay sequential, but the module should be designed
toward LangGraph's durable execution model: compiled graphs, virtual `START` and
`END` nodes, supersteps, reducer-driven state updates, checkpoints, interrupts,
commands, streaming, and subgraphs.

### Source Inspiration

The graph design is informed by LangGraph's docs on the graph API, reducers,
commands, persistence, checkpointers, interrupts, streaming, subgraphs, and fault
tolerance:

- <https://docs.langchain.com/oss/python/langgraph/graph-api>
- <https://docs.langchain.com/oss/python/langgraph/persistence>
- <https://docs.langchain.com/oss/python/langgraph/checkpointers>
- <https://docs.langchain.com/oss/python/langgraph/interrupts>
- <https://docs.langchain.com/oss/python/langgraph/streaming>
- <https://docs.langchain.com/oss/python/langgraph/event-streaming>
- <https://docs.langchain.com/oss/python/langgraph/use-subgraphs>
- <https://docs.langchain.com/oss/python/langgraph/fault-tolerance>

### Responsibilities

- Store named nodes.
- Store direct and conditional edges.
- Validate graph structure at compile time.
- Produce an immutable executable graph.
- Run async node handlers.
- Route based on node output or command output.
- Apply partial state updates through reducers.
- Enforce recursion limits.
- Persist checkpoints at safe boundaries.
- Support interrupts and resume.
- Stream typed execution events.
- Write readable execution status records for graph runs.
- Maintain append-only graph event journals for external listeners.
- Cache derived graph observability projections without making them the source
  of truth.
- Return final state and execution history.
- Support graph visualization and serialization later.

### Core Concepts

`State` is user-owned application state. RustAgents should never require a
specific state shape for hand-written Rust graphs.

`Node<State>` is an async unit of work.

`NodeOutput<State>` controls execution in the current scaffold:

- `Continue(State)` follows a direct edge.
- `Route { state, route }` follows a conditional edge.
- `End(State)` stops execution.

The target design should evolve this into partial updates and commands:

```rust
pub enum NodeResult<Update> {
    Update(Update),
    Command(Command<Update>),
    Interrupt(Interrupt),
}

pub struct Command<Update> {
    pub update: Option<Update>,
    pub goto: Vec<NodeId>,
    pub resume: Option<serde_json::Value>,
}
```

`GraphBuilder<State, Update>` should own graph construction. `CompiledGraph`
should own execution. This separates user-friendly mutation during setup from a
validated immutable runtime.

```rust
let graph = GraphBuilder::new()
    .add_node("agent", agent_node)
    .add_node("tools", tools_node)
    .add_edge(START, "agent")
    .add_conditional_edges("agent", route_agent)
    .add_edge("tools", "agent")
    .compile()?;
```

### State Updates And Reducers

LangGraph nodes return partial state updates. RustAgents should adopt the same
direction because it enables parallel execution, replay, checkpointing, and
clearer node contracts.

The default reducer should be overwrite. Users should be able to opt into
reducers for fields that accumulate values:

- append list
- merge messages by id
- set union
- numeric min/max
- custom reducer

Possible Rust shape:

```rust
pub trait Reducer<T>: Send + Sync {
    fn reduce(&self, current: T, update: T) -> Result<T>;
}

pub trait StateReducer<State, Update>: Send + Sync {
    fn apply(&self, state: State, update: Update) -> Result<State>;
}
```

For milestone 1, whole-state updates are acceptable. For durable parallel graph
execution, partial updates and reducers should be introduced before
checkpoint/resume semantics harden.

### Graph Lifecycle

1. Define state.
2. Define update type if partial updates are enabled.
3. Create graph builder.
4. Add nodes.
5. Add direct or conditional edges.
6. Add `START` edge.
7. Compile and validate the graph.
8. Run graph with initial state and runtime config.
9. Inspect final state, checkpoints, events, and visited nodes.

### Routing Semantics

Direct routing:

```text
START -> agent -> summarize -> END
```

Conditional routing:

```text
START -> agent
agent --tool--> tools
agent --final--> END
tools ---------> agent
```

Conditional routes may start as explicit strings. Later versions should support
typed route enums or route newtypes so Rust users can avoid typo-prone strings.

Nodes should not mix static outgoing edges and dynamic command-based routing in
the same execution mode unless the behavior is deliberately specified. A strict
compile-time validation rule is preferable: a node has either normal outgoing
edges or command routing, not both.

### Supersteps

The target executor should be superstep-based:

1. Take the current active node set.
2. Run all active nodes for the step, respecting concurrency policy.
3. Collect partial state updates, commands, interrupts, and errors.
4. Apply reducers at the step boundary.
5. Persist a checkpoint.
6. Select the next active nodes.
7. Stop when the active set is empty or reaches `END`.

The first implementation can run one node at a time, but checkpointing and
parallel execution should use superstep boundaries as the durable unit. Do not
checkpoint mid-node.

### Checkpointing And Persistence

Graph checkpointing is not the same as harness memory. Checkpoints are
thread-scoped graph execution snapshots used for resume, interrupts, and fault
tolerance.

```rust
#[async_trait]
pub trait Checkpointer<State>: Send + Sync {
    async fn put(&self, checkpoint: Checkpoint<State>) -> Result<CheckpointId>;
    async fn get(&self, thread_id: &str, checkpoint_id: Option<&str>) -> Result<Option<Checkpoint<State>>>;
    async fn list(&self, thread_id: &str) -> Result<Vec<CheckpointMetadata>>;
}
```

A checkpoint should contain:

- thread id
- checkpoint id
- parent checkpoint id
- namespace
- state snapshot
- next active nodes
- completed tasks for the superstep
- pending writes
- interrupts
- metadata

Interrupted or failed nodes may rerun from the beginning. Node authors must make
side effects idempotent or isolate side effects behind tools/middleware that can
record exactly-once intent.

### Interrupts And Resume

Interrupts support human-in-the-loop and external approval flows.

```rust
pub struct Interrupt {
    pub id: String,
    pub node: NodeId,
    pub payload: serde_json::Value,
}
```

Resume should use a command-style API:

```rust
graph.resume(
    RunConfig::thread("support-123"),
    Command::resume(json!({ "approved": true })),
).await?;
```

The default semantic should match LangGraph: resuming restarts the interrupted
node and replays until the interrupt point using stored resume values. That is
more durable than trying to suspend an async Rust stack.

### Streaming

The graph should expose low-level runtime events, higher-level projections, a
status store, and optional durable replay for outside listeners. The canonical
feature references are:

- [Graph streaming and events]../modules/graph/streaming.md
- [Graph observability and tracing]../modules/graph/observability.md
- [Graph checkpointing and state inspection]../modules/graph/checkpointing.md
- [Graph memory and stores boundary]../modules/graph/memory-boundary.md

Low-level events:

- node started
- node completed
- node failed
- state update
- checkpoint saved
- task scheduled
- interrupt emitted
- route selected

High-level stream modes:

- values: full state snapshots
- updates: partial state updates
- messages: model/message deltas emitted by harness nodes
- debug: verbose executor events
- interrupts: interrupt payloads
- custom: user events

The graph should also expose a compact run-status record:

```rust
pub struct GraphRunStatus {
    pub run_id: RunId,
    pub root_run_id: RunId,
    pub parent_run_id: Option<RunId>,
    pub thread_id: Option<ThreadId>,
    pub graph_id: GraphId,
    pub checkpoint_id: Option<CheckpointId>,
    pub checkpoint_namespace: Vec<String>,
    pub status: ExecutionStatus,
    pub current_step: usize,
    pub active_nodes: Vec<NodeId>,
    pub pending_interrupts: Vec<InterruptId>,
    pub last_event_id: Option<EventId>,
    pub started_at: SystemTime,
    pub updated_at: SystemTime,
    pub ended_at: Option<SystemTime>,
    pub error: Option<GraphErrorSummary>,
}
```

Graph status records are not checkpoints. Checkpoints preserve resumable graph
state; status records summarize live and recent execution for observers. A
graph event journal should let listeners subscribe live or replay from a stored
offset by run id, root run id, thread id, graph id, node id, event kind, or
namespace. Derived projections such as latest status by thread, task timing
rollups, checkpoint summaries, and introspection snapshots may be cached when
they include source coordinates: run id, checkpoint id, namespace, step, event
offset, and projection version.

### Subgraphs

Subgraphs should be executable graphs that can be used as nodes.

Two modes are needed:

- shared-state subgraph: parent and child graph use the same state channels
- adapter subgraph: wrapper node maps parent state into child state and maps the
  child result back into parent state

Checkpoint namespaces are required so parent and child checkpoint ids do not
collide.

### Execution Guarantees

The graph runtime should guarantee:

- every visited node existed at validation time
- every configured edge points to an existing node
- conditional routes fail clearly when missing
- recursion limit failures are deterministic
- checkpoint writes happen at configured execution boundaries
- interrupted runs can be resumed only when checkpointing is configured
- final state is returned exactly once

The graph runtime should not guarantee:

- deterministic LLM output
- tool idempotency
- provider-specific retry behavior
- persistence across process restarts unless a checkpointer is configured
- exactly-once side effects inside node code

### Future Graph Features

- graph serialization to JSON
- Mermaid export
- parallel branches
- joins
- typed route enums
- static graph analysis
- graph diffing
- graph snapshots for tests
- durable task queue integration

## Module 3: Expressive Language

The expressive language is a compact way to define agent workflows without
writing all builder calls manually. It should compile into the same graph and
harness types as Rust code.

This language is not meant to replace Rust. It is a workflow definition layer for
fast iteration, examples, documentation, and eventually user-authored agent
plans.

It is also the safe boundary for agent-authored graph plans. A REPL or model may
propose `.rag` source, but that source must pass through the same parser,
diagnostics, registry binding, allowlist checks, review gates, and graph
compiler as human-authored source before it can run.

### Goals

- Make common agent graphs readable at a glance.
- Keep syntax close to graph intent.
- Compile into explicit RustAgents structures.
- Preserve source locations for helpful errors.
- Avoid embedding arbitrary code in the first version.
- Describe state channels, reducers, policies, subgraphs, sub-agents,
  interrupts, joins, and fanout as declarative graph primitives.
- Produce inspectable blueprints that can be reviewed, diffed, registered, and
  tested.

### Non-Goals

- It is not a general-purpose programming language.
- It is not a prompt templating language by itself.
- It should not execute untrusted code.
- It should not bypass Rust type checks for stateful logic.
- It should not install model-generated topology directly into the graph
  runtime.

### Initial Syntax Sketch

```rustagents
graph support_agent {
  defaults {
    recursion_limit 50
    checkpoint inherit
  }

  start agent

  channel messages messages
  channel tool_calls append

  node agent {
    kind agent
    model "default"
    prompt "You are a concise support agent."
    tools ["lookup_user", "create_ticket"]
    routes {
      tool_call -> tools
      final -> END
    }
  }

  node tools {
    kind tool_executor
    next agent
  }
}
```

### Minimal Grammar

```text
program       = graph_decl*
graph_decl    = "graph" ident "{" graph_item* "}"
graph_item    = start_decl | defaults_decl | channel_decl | node_decl | edge_decl
start_decl    = "start" ident
defaults_decl = "defaults" object
channel_decl  = "channel" ident reducer_ref
node_decl     = "node" ident "{" node_item* "}"
node_item     = kind_decl | model_decl | prompt_decl | tools_decl | next_decl | routes_decl
kind_decl     = "kind" ident
model_decl    = "model" string
prompt_decl   = "prompt" string
tools_decl    = "tools" "[" string_list? "]"
next_decl     = "next" ident
routes_decl   = "routes" "{" route_decl* "}"
route_decl    = ident "->" (ident | "END")
edge_decl     = ident "->" ident
```

The full language target is broader than this minimal grammar. It should grow
toward commands, `Send` fanout, joins/barriers, subgraphs, sub-agents,
`repl_agent` nodes, interrupts, registered route functions, graph defaults,
capability allowlists, blueprint provenance, and deterministic graph diffs. See
[the expressive language module](../modules/expressive-language/README.md) for
the canonical target.

### Compilation Pipeline

1. Parse source into an AST.
2. Validate identifiers and route targets.
3. Lower AST into graph builder calls.
4. Bind model and tool references through the harness.
5. Return a compiled workflow object.

### Error Requirements

Errors should include:

- file name when available
- line and column
- invalid token or missing token
- unknown node name
- duplicate node name
- missing start node
- route target that does not exist
- model or tool reference that is not registered in the harness

### Runtime Relationship

The expressive language should produce the same runtime structures as hand-written
Rust:

```text
source -> parser -> AST -> compiler -> StateGraph<State> + Harness bindings
```

The graph runtime should not know whether a graph came from Rust builders or the
expressive language.

For generated source, the runtime relationship is:

```text
REPL/model proposal -> .rag source or AST -> parser -> diagnostics -> resolver
  -> policy/review gate -> compiler -> GraphBuilder + Harness bindings
  -> CompiledGraph -> optional registry registration
```

## Package Layout

Target module layout:

```text
src/
  chat.rs
  error.rs
  graph.rs
  harness.rs
  language/
    ast.rs
    lexer.rs
    parser.rs
    compiler.rs
    mod.rs
  model.rs
  tool.rs
```

Provider implementations should live behind feature flags:

```text
src/providers/
  openai.rs
  anthropic.rs
  ollama.rs
  mock.rs
```

## Milestones

### Milestone 1: Core Runtime

- Chat message primitives.
- Model trait.
- Tool trait.
- State graph with direct and conditional edges.
- Basic tests and examples.

### Milestone 2: Harness

- Harness type.
- Model registry.
- Tool registry.
- Run context.
- Callback events.
- Run status store.
- Durable event journal.
- Cache-backed observability projections.
- Mock model and mock tool utilities.

### Milestone 3: Expressive Language Preview

- AST.
- Parser for a small graph definition language.
- Compiler into `StateGraph`.
- Helpful parse and validation errors.
- Example `.rag` or `.rustagents` workflow file.

### Milestone 4: Provider Integrations

- OpenAI chat model provider.
- Anthropic chat model provider.
- Local/mock provider.
- Provider feature flags.

### Milestone 5: Production Runtime Features

- Streaming events.
- Checkpointing.
- Resume support.
- Graph run status store.
- Graph event journal and listener replay.
- Graph export.
- Tracing integration.

## Open Questions

- Should the expressive language file extension be `.rag`, `.rustagents`, or
  something shorter?
- Should state schemas be declared in the language, or should state remain purely
  Rust-owned?
- Should graph nodes support typed route enums before serialization support?
- Should provider crates live in this crate behind feature flags or in separate
  crates?
- Should memory be synchronous, async, or both?