mx 0.1.178

A Swiss army knife for Claude Code and multi-agent toolkits
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
#import "lib.typ": *

#page-header("Architecture", "System internals for contributors.")

This page describes how mx is built. It covers the module structure, dispatch
model, storage backends, and encoding pipeline. The audience is contributors
reading the source code, not users running commands.

== Table of contents

- #link(<overview>)[Overview]
- #link(<module-structure>)[Module structure]
- #link(<command-dispatch>)[Command dispatch]
- #link(<path-management>)[Path management]
- #link(<surrealdb-integration>)[SurrealDB integration]
- #link(<knowledge-graph>)[Knowledge graph data model]
- #link(<codex-archive>)[Codex archive format]
- #link(<kv-store>)[KV store format]
- #link(<base-d-integration>)[Base-d integration]
- #link(<testing-patterns>)[Testing patterns]


// =========================================================================
// OVERVIEW
// =========================================================================

== Overview <overview>

mx is a single-binary Rust CLI built on three pillars:

+ *clap derive* for the command tree -- every subcommand, flag, and validation
  rule is expressed as Rust types in `src/cli.rs`.
+ *SurrealDB* for the knowledge graph -- an embedded SurrealKV database (or
  optional network WebSocket connection) stores entries, relationships, tags,
  embeddings, and metadata.
+ *base-d* for commit encoding -- a separate crate that hashes, compresses, and
  encodes commit messages through randomly selected dictionaries.

The binary is `mx`. There is no library crate; `main.rs` declares modules and
calls into handlers. The Rust edition is 2024.

Key dependencies:

#table(
  columns: (auto, auto, 1fr),
  table.header([*Crate*], [*Version*], [*Role*]),
  [`clap`], [4], [CLI parsing with derive macros],
  [`surrealdb`], [2], [Embedded + WebSocket knowledge store],
  [`base-d`], [3], [Dictionary-based hash/compress encoding],
  [`tokio`], [1], [Async runtime for SurrealDB (multi-thread)],
  [`fastembed`], [5.6], [Local vector embeddings (BGE-Base-EN-v1.5, 768-dim)],
  [`serde` / `serde_json` / `toml` / `serde_yaml`], [1 / 1 / 0.8 / 0.9], [Serialization across JSON, TOML, YAML],
  [`chrono`], [0.4], [Timestamps with serde support],
  [`anyhow` / `thiserror`], [1 / 2], [Error handling (anyhow for handlers, thiserror for typed errors)],
  [`reqwest`], [0.12], [HTTP client for GitHub API calls],
  [`jsonwebtoken`], [10], [JWT signing for GitHub App auth],
  [`pulldown-cmark`], [0.13], [Fence-aware heading extraction],
  [`colored`], [2], [Terminal colors],
)


// =========================================================================
// MODULE STRUCTURE
// =========================================================================

== Module structure <module-structure>

All source lives under `src/`. The top-level modules declared in `main.rs` are:

```
src/
 main.rs            # entry point, Cli::parse(), match on Commands
 cli.rs             # the full command tree (clap derive enums)
 paths.rs           # single source of path truth
 handlers/          # command handler routing
   mod.rs           # top-level dispatchers (pr, github, codex, log, show, etc.)
   memory.rs        # mx memory subcommand handler
   kv.rs            # mx kv subcommand handler
   metadata.rs      # metadata subcommand handler (categories, tags, etc.)
   state.rs         # mx state subcommand handler (deprecated)
 commit.rs          # encoding pipeline (hash + compress + encode)
 knowledge.rs       # KnowledgeEntry struct (the core data model)
 store.rs           # KnowledgeStore trait (abstract storage interface)
 surreal_db/        # SurrealDB implementation of KnowledgeStore
   mod.rs           # SurrealDatabase struct, with_db! macro, RecordId
   connection.rs    # SurrealMode, SurrealConfig, SurrealConnection enum
   knowledge.rs     # SurrealKnowledgeRecord DTO, query hydration
   queries.rs       # backup operations, query helpers
   lookups.rs       # lookup table CRUD (categories, agents, projects, etc.)
   relationships.rs # graph edge operations (relates_to)
   trait_impl.rs    # KnowledgeStore impl for SurrealDatabase
   tests.rs         # integration tests
 codex/             # session conversation archival
   mod.rs           # manifest types, re-exports
   archive/         # the archive pipeline
     mod.rs         # ArchiveRequest, ArchiveOptions, entry points
     include.rs     # IncludeSet (--include flag parser)
     write.rs       # per-session writer, --all driver loop
     sources.rs     # source walkers (subagent discovery, etc.)
     paths.rs       # archive-folder naming, short-ID extraction
     backfill.rs    # vault backfill (--backfill flag)
   export/          # mx codex export pipeline
   index/           # codex indexing
   images.rs        # base64 image extraction from JSONL
   transcript.rs    # conversation.md rendering
   read.rs          # list, read, search operations
   migrate.rs       # v1->v2 archive migration
   notices.rs       # vault-present warnings
 embeddings.rs      # EmbeddingProvider trait, FastEmbedProvider
 kv.rs              # KV store engine (schema TOML + data JSON)
 types.rs           # shared domain types (Agent, Category, Project, etc.)
 display.rs         # safe_truncate, formatting helpers
 tensor.rs          # emotional state tensor encode/decode (deprecated, serves mx state)
 github.rs          # GitHub API operations (cleanup, comments)
 sync/              # GitHub sync (issues, wiki)
 convert.rs         # md2yaml / yaml2md conversion
 session.rs         # deprecated session export (forwards to codex)
 index.rs           # legacy index operations
 helpers.rs         # shared utilities
 wake_chunk.rs      # wake ritual chunking
 wake_ritual.rs     # wake ritual flow
 wake_token.rs      # HMAC-signed wake session tokens
 engage.rs          # interactive wake engage mode
 content_ops.rs     # content editing operations (find/replace, append, etc.)
```

=== Module boundaries

The codebase follows a layered pattern:

+ *CLI layer* (`cli.rs`) -- pure data. No logic, no imports beyond clap.
  Every command variant, flag, and validation constraint is a type.
+ *Handler layer* (`handlers/`) -- orchestration. Reads CLI args, calls into
  domain modules, formats output. Handlers own `println!` and `eprintln!`.
  They do not own business logic.
+ *Domain layer* (`commit.rs`, `knowledge.rs`, `store.rs`, `kv.rs`,
  `codex/`, `embeddings.rs`, `tensor.rs`) -- the actual work. Pure functions
  where possible, side effects isolated to well-defined boundaries (git
  subprocesses, database calls, filesystem writes).
+ *Infrastructure layer* (`surreal_db/`, `paths.rs`, `github.rs`) --
  external integrations. SurrealDB, filesystem, GitHub API.


// =========================================================================
// COMMAND DISPATCH
// =========================================================================

== Command dispatch <command-dispatch>

The dispatch path is:

```
main() -> Cli::parse() -> match cli.command { ... }
```

`main.rs` is small by design. It does three things:

+ Emits a legacy-path deprecation note if `MX_MEMORY_PATH` is set.
+ Parses the CLI with `clap::Parser::parse()`.
+ Pattern-matches on the top-level `Commands` enum and calls the appropriate
  handler.

Some commands dispatch directly to domain functions from `main.rs`:

```rust
Commands::Commit { .. } => commit::upload_commit(..),
Commands::Log { .. } => handle_log(..),
Commands::Show { .. } => handle_show(..),
```

Others dispatch through `handlers/mod.rs`:

```rust
Commands::Memory { command } => handle_memory(command, cli.verbose),
Commands::Kv { command } => handle_kv(command, cli.verbose),
Commands::Codex { command } => handle_codex(command),
```

The handler functions in `handlers/mod.rs` then match on the subcommand enum
and call into domain modules. For example, `handle_codex` matches on
`CodexCommands::Archive`, `CodexCommands::Export`, etc., and routes each to the
appropriate function in `codex::archive`, `codex::export`, or `codex::read`.

=== The `Commit` command

The `Commit` variant is handled inline in `main.rs` rather than through a
handler, because it has two distinct modes selected by the `--encode-only` flag:

+ *Normal mode*: calls `commit::upload_commit()` with the message, stage/push
  flags, and display preferences.
+ *Encode-only mode*: calls `commit::encode_commit_message()` with explicit
  title and body text, prints the result, and exits. No git state is touched.

=== Exit codes

Most commands exit 0 on success or propagate an `anyhow::Error` (which prints
the error chain to stderr and exits non-zero). The `kv` subcommand is the
exception: it uses typed exit codes (0 = OK, 1 = key not found, 2 = type
mismatch, 3 = schema missing, 4 = invalid input) so callers can distinguish
failure modes programmatically. The `KvError` enum covers five typed variants:
`KeyNotFound`, `TypeMismatch`, `SchemaMissing`, `EntryNotFound` (a specific
entry ID was not found within a key), and `AmbiguousId` (an ID prefix
matched multiple entries). Both `EntryNotFound` and `AmbiguousId` map to
exit code 4.


// =========================================================================
// PATH MANAGEMENT
// =========================================================================

== Path management <path-management>

`src/paths.rs` is the single source of truth for every filesystem path mx
touches. The module is deliberately the _only_ file in the codebase that calls
`dirs::home_dir()`. Every other module that needs a path calls a function from
`paths.rs`.

=== The base directory

All paths derive from `mx_home()`, which resolves once per process via
`OnceLock`:

+ If `MX_HOME` is set and non-empty, use it.
+ Otherwise, use `~/.mx/`.

The result is cached for the lifetime of the process.

=== Derived paths

Each subsystem has its own function in `paths.rs`:

#table(
  columns: (auto, auto),
  table.header([*Function*], [*Returns*]),
  [`mx_home()`], [`$MX_HOME` or `~/.mx/`],
  [`kv_schema_path(agent)`], [`$MX_HOME/kv/schema/{agent}.toml`],
  [`kv_data_path(agent)`], [`$MX_HOME/kv/data/{agent}.json`],
  [`surreal_root()`], [`$MX_SURREAL_ROOT` or `$MX_HOME/memory/surreal/`],
  [`codex_dir()`], [`$MX_CODEX_PATH` or `$MX_HOME/codex/`],
  [`fastembed_cache_dir()`], [XDG cache or `$MX_HOME/memory/embed/` when isolated],
  [`memory_seed_agents_dir()`], [`$MX_HOME/memory/seed/agents/`],
  [`memory_seed_knowledge_dir()`], [`$MX_HOME/memory/seed/knowledge/`],
  [`state_schemas_dir()`], [`$MX_HOME/state/schemas/`],
  [`swap_dir()`], [`$MX_HOME/swap/`],
  [`sync_cache_dir(repo)`], [`$MX_HOME/cache/sync/{repo-slug}/`],
)

=== The `_with()` test-seam pattern <with-pattern>

Pure resolution logic is factored into `_with` variants that take
env-var values as explicit parameters instead of reading `std::env`:

```rust
fn codex_dir_with(env_val: Option<&str>, home: &Path) -> PathBuf {
    if let Some(path) = env_val && !path.is_empty() {
        return PathBuf::from(path);
    }
    home.join("codex")
}

pub fn codex_dir() -> PathBuf {
    codex_dir_with(
        std::env::var("MX_CODEX_PATH").ok().as_deref(),
        mx_home(),
    )
}
```

Tests call the `_with` variant directly with controlled inputs. The public
function is a thin wrapper that reads the env var and passes it in. This keeps
tests parallel-safe (no env-var mutation) and the resolution logic unit-testable
in isolation.

The same pattern is used by `surreal_root_with`, `fastembed_cache_dir_with`,
`resolve_mx_home_with`, and `resolve_kv_path_with`.

=== External paths (read-only)

`paths.rs` also provides helpers for locations owned by other tools that mx
reads but never writes:

- `claude_dir()` -- `~/.claude/`
- `claude_projects_dir()` -- `~/.claude/projects/` (override:
  `MX_CLAUDE_PROJECTS_DIR` for tests)
- `claude_subagents_dir(slug, session)` -- subagent JSONL location
- `claude_sessions_dir()` -- per-PID liveness JSONs
- `claude_history_jsonl()` -- slash-command history
- `claude_mcp_logs_dir(slug)` -- MCP server log parent directory
- `wonka_vault_archives_dir()` -- legacy vault snapshots (`~/.wonka/vault/archives/`)

These are centralized in `paths.rs` so the codex archive source walkers have a
single source of truth for Claude's on-disk layout.


// =========================================================================
// SURREALDB INTEGRATION
// =========================================================================

== SurrealDB integration <surrealdb-integration>

The knowledge graph is backed by SurrealDB. The integration supports two
connection modes:

=== Embedded mode (default)

Uses the `SurrealKV` engine -- a local, file-based key-value store compiled
into the mx binary. No external server process is required. The database files
live at `$MX_HOME/memory/surreal/` (override with `MX_SURREAL_ROOT`).

On first connection, the schema file
(`schema/surrealdb-schema.surql`) is applied via `include_str!`. This is
compiled into the binary -- there is no runtime file read. The schema uses
`DEFINE ... IF NOT EXISTS` and `UPSERT` throughout, making it safe to re-apply
on every startup.

=== Network mode

When `MX_SURREAL_MODE=network`, mx connects to an external SurrealDB instance
over WebSocket (`ws://` or `wss://`). The local `surreal_root` path is unused.
Authentication supports three levels (root, namespace, database), configured
via `MX_SURREAL_AUTH_LEVEL`. Password can be provided directly
(`MX_SURREAL_PASS`) or read from a file (`MX_SURREAL_PASS_FILE`, useful for
agenix-managed secrets on NixOS).

=== Connection architecture

The connection is represented as an enum:

```rust
pub enum SurrealConnection {
    Embedded(Surreal<surrealdb::engine::local::Db>),
    Network(Surreal<WsClient>),
}
```

A `with_db!` macro dispatches across both variants:

```rust
macro_rules! with_db {
    ($self:expr, $db:ident, $body:expr) => {
        match &$self.conn {
            SurrealConnection::Embedded($db) => $body,
            SurrealConnection::Network($db) => $body,
        }
    };
}
```

This allows every query function to be written once and work against both
backends. The `SurrealDatabase` struct wraps the connection and exposes
synchronous methods that internally use a `block_on` bridge over a global
`OnceLock<Runtime>` tokio runtime.

=== The `KnowledgeStore` trait

`src/store.rs` defines the `KnowledgeStore` trait -- the abstract interface for
knowledge storage. `SurrealDatabase` implements this trait in
`surreal_db/trait_impl.rs`. The trait surface includes:

- CRUD: `upsert_knowledge`, `get`, `delete`
- Search: `search` (full-text BM25), `semantic_search` (vector cosine
  similarity)
- Listing: `list_by_category`, `count_by_category`, `list_all`, `count`
- Wake cascade: `wake_cascade` (layered identity retrieval)
- Lookups: categories, agents, projects, sessions, relationships, tags
- Reinforcement: `reinforce` (increment resonance, update activation metadata)
- Backups: pre-mutation content snapshots

The trait exists to decouple handler logic from the storage backend. In
practice, `SurrealDatabase` is the only implementation.


// =========================================================================
// KNOWLEDGE GRAPH DATA MODEL
// =========================================================================

== Knowledge graph data model <knowledge-graph>

The schema lives in `schema/surrealdb-schema.surql` and is compiled into the
binary. It defines a SCHEMAFULL relational-graph model.

=== Core entity: `knowledge`

The central table is `knowledge`. Each row represents one knowledge entry with
the following field groups:

*Identity and content:*
- `title` (string), `body` (optional string), `summary` (optional string)
- `content_hash` (string) -- for change detection during seed/import
- `format` -- `markdown`, `json`, or `stele:*` variants

*Classification (record links):*
- `category` (record\<category\>) -- pattern, technique, insight, gotcha,
  reference, decision, bloom, session
- `source_type` (record\<source\_type\>) -- manual, ram, cache, agent\_session
- `entry_type` (record\<entry\_type\>) -- primary, summary, synthesis
- `content_type` (record\<content\_type\>) -- text, code, config, data, binary
- `source_project`, `source_agent`, `session` -- optional record links

*Visibility:*
- `visibility` -- `public` or `private` (ASSERT constraint)
- `owner` -- agent ID for private entries

*Resonance (wake-up cascade):*
- `resonance` (int) -- importance level, 1--10 with overflow for transcendent
- `resonance_type` -- foundational, transformative, relational, operational,
  ephemeral, session
- `last_activated` (datetime), `activation_count` (int)
- `decay_rate` (float, 0.0--1.0) -- some memories fade, some do not
- `anchors` (array\<string\>) -- IDs of related blooms this entry connects to
- `wake_phrases` (array\<string\>) -- verification phrases for the wake ritual
- `wake_order` (optional int) -- custom sequence position

*Embeddings:*
- `embedding` (optional array\<float\>) -- 768-dim vector (BGE-Base-EN-v1.5)
- `embedding_model` (optional string), `embedded_at` (optional datetime)

=== Graph relations

SurrealDB's graph relations replace traditional junction tables:

#table(
  columns: (auto, auto, auto),
  table.header([*Relation table*], [*Direction*], [*Purpose*]),
  [`tagged_with`], [knowledge -> tag], [Freeform labels],
  [`applies_to`], [knowledge -> applicability\_type], [Scope constraints (language, platform, domain)],
  [`relates_to`], [knowledge -> knowledge], [Inter-entry graph edges],
  [`project_tagged_with`], [project -> tag], [Project-level tags],
  [`project_applies_to`], [project -> applicability\_type], [Project scope],
)

The `relates_to` relation carries a `relationship_type` field
(record\<relationship\_type\>) and is uniquely indexed on the triple (from, to,
type). Relationship types are: related, supersedes, extends, implements,
contradicts, example\_of.

=== Lookup tables

Eight lookup tables provide controlled vocabularies: `category`, `project`,
`agent`, `applicability_type`, `source_type`, `entry_type`, `content_type`,
`relationship_type`, `session_type`, `tag`. Default seed data is applied via
`UPSERT` in the schema file. Users can extend them through
`mx memory categories add`, `mx memory agents add`, etc.

=== Full-text search

A `simple` analyzer (blank + class tokenizers, lowercase filter) powers BM25
search indexes on `title`, `body`, and `summary`. Searches via
`mx memory search` query all three indexes.

=== Vector search

Embeddings are 768-dimensional float arrays generated by FastEmbed
(BGE-Base-EN-v1.5, local inference). The search strategy is brute-force cosine
similarity -- no HNSW index. This is deliberate at the current scale; the
schema comment notes to reconsider when the store exceeds 50K vectors or
100ms query latency.

The `EmbeddingProvider` trait in `embeddings.rs` abstracts the embedding
backend. `FastEmbedProvider` is the sole implementation. The model cache
location is controlled by `paths::fastembed_cache_dir()`.

=== Backups

The `memory_backup` table stores pre-mutation content snapshots. Before any
update, edit, append, prepend, or delete operation, the current content is
written to a backup row. Backups reference entries by plain string ID (not a
record link) so they survive entry deletion.


// =========================================================================
// CODEX ARCHIVE FORMAT
// =========================================================================

== Codex archive format <codex-archive>

The codex is the session conversation archive. `mx codex archive` captures
Claude Code sessions from `~/.claude/projects/` into permanent storage at
`$MX_HOME/codex/`.

=== Archive directory layout

Each archive is a directory named with the pattern:

```
{date}_{short-session-id}[_{counter}]
```

For example: `2026-04-30_abc12345` or `2026-04-30_abc12345_2` for incremental
saves.

Inside each archive directory:

```
{archive}/
  manifest.json       # metadata (version, timestamps, counts, checksums)
  session.jsonl        # raw session JSONL (unless --clean)
  conversation.md      # clean markdown transcript (when --clean or migrated)
  images/              # extracted base64 images (v2+)
    image_001.png
    image_002.png
  agents/              # subagent session JSONLs (when --include subagents)
    agent-{uuid}.jsonl
```

=== Manifest

The manifest is a JSON file tracking archive metadata. The current write
version is 5. All fields added since v2 are `Option` so older archives
deserialize cleanly.

Key fields:

- `version` -- manifest format version (2--5)
- `session_id` -- the Claude session UUID
- `archived_at`, `session_start`, `session_end` -- timestamps
- `project_path` -- the working directory of the session
- `message_count`, `agent_count` -- summary statistics
- `agents` -- array of `AgentInfo` (id, file, message count)
- `size_bytes`, `checksum` -- integrity data
- `image_count`, `images` -- v2: extracted image metadata
- `has_clean_transcript` -- v3: whether `conversation.md` exists
- `user_name`, `assistant_name` -- v4: configurable speaker names
- `source_breakdown` -- v5: per-sidecar byte counts

=== The `IncludeSet`

The `--include` flag on `mx codex archive` controls which optional source
artifacts are captured. It parses a comma-separated string into a struct with
boolean fields:

- `subagents` (default: true) -- capture subagent session JSONLs
- `mcp` -- capture MCP server logs
- `tool_output` -- capture `/tmp` tool outputs
- `history` -- capture `history.jsonl` slice
- `all` / `none` -- shortcuts

=== Source walkers

The archive pipeline uses source walkers to discover files for capture.
Currently `sources.rs` implements subagent discovery
(`find_agent_sessions`). The other source types (MCP, tool-output, history)
are declared in the `IncludeSet` but their walkers are pending implementation
in future PRs.


// =========================================================================
// KV STORE FORMAT
// =========================================================================

== KV store format <kv-store>

The KV store (`src/kv.rs`) is a lightweight local state engine for agents.
No networking, no database -- just a TOML schema file and a JSON data file
per agent.

=== Schema (TOML)

Each agent's schema lives at `$MX_HOME/kv/schema/{agent}.toml` and declares
the keys, types, constraints, and defaults:

```toml
[keys.commit_count]
type = "counter"
min = 0

[keys.recent_files]
type = "history"
max_entries = 50

[keys.current_task]
type = "string"
default = ""

[keys.focus_areas]
type = "list"
description = "Areas of active focus"

[keys.session_state]
type = "state"
fields = ["mode", "context", "priority"]
```

Supported types:

#table(
  columns: (auto, 1fr),
  table.header([*Type*], [*Behavior*]),
  [`counter`], [Integer with optional `min`/`max` bounds. Supports `inc`, `dec`, `set`, `get`],
  [`string`], [Simple string value. Supports `set`, `get`],
  [`history`], [Timestamped append-only log with optional `max_entries` cap. Supports `push`, `last`, `since`, `search`, `count`, `random`, `update`, `migrate`. Each entry gets a numeric index and a stable base58 entry ID (`kv-` prefix). Entries can carry optional structured JSON data (`--data` on push/update, `--where` on queries). The `last`, `search`, `count`, and `random` commands accept time-range flags (`--day`, `--month`, `--week`, `--since`, `--from`/`--to`) for date filtering.],
  [`list`], [Ordered list with timestamps. Supports `push`, `pop`, `remove`, `search`, `count`, `random`, `update`, `migrate`. Each entry gets a numeric index and a stable base58 entry ID. Entries can carry optional structured JSON data. The `last`, `search`, `count`, and `random` commands accept the same time-range flags as history.],
  [`state`], [Named fields (like a struct). Supports single-field set (`set <key> <field> <value>`), batch set (`set <key> field=value ...` or `set <key> --json '{...}'`), tensor positional set (`set <key> --json '[...]'`), and `get`. Batch operations validate all fields against the schema before writing.],
)

=== Data (JSON)

The data file at `$MX_HOME/kv/data/{agent}.json` holds current values. All
writes are atomic: serialize to a temp file, fsync, rename. The format is a
flat JSON object keyed by the key names from the schema.

History and list entries are stored as objects with `id` (stable entry ID,
serialized from the `id` field), `hash` (legacy on-disk name for the entry ID,
read via `serde(rename)`), `value`, `ts`, an optional `data` field (arbitrary
JSON object for structured metadata), and an optional `memory` field (a `kn-`
ID linking the entry to a knowledge node in the memory graph). In the Rust
structs, the numeric sequence number is the `index` field (serialized as `id`
on disk) and the stable base58 identifier is the `id` field (serialized as
`hash` on disk). The on-disk names are preserved via `serde(rename)` for
backward compatibility -- no data migration is needed. The entry ID is a short
base58 string generated from `blake3(key + timestamp + index)` via base-d,
providing a stable identifier independent of numeric ordering. The `id`
(entry ID), `data`, and `memory` fields all use `#[serde(default)]` for
backward compatibility -- files written before these fields existed are
back-filled on first load (IDs are generated, data and memory default to
`None`) and saved automatically.

=== Schema mutation

The `KvStore` struct holds a `schema_path` field alongside the existing
`data_path`. The `add_key_to_schema()` method validates the key name
(alphanumeric, underscores, hyphens; max 128 chars; no dots), appends a
`[keys.<name>]` block to the TOML file without reformatting existing
content, and re-parses the file to update the in-memory `Schema`. This is
exposed through `push --create <type>` at the CLI layer, where the handler
calls `add_key_to_schema` before the normal push path. If the key already
exists, the method is a no-op.

=== Per-agent keying

The active agent is determined by the `MX_CURRENT_AGENT` environment variable.
Schema and data files are resolved via `paths::kv_schema_path(agent)` and
`paths::kv_data_path(agent)`. The path resolution includes a legacy fallback
to `~/.crewu/kv/` for migration purposes.

=== Memory pointers

KV keys can optionally link to a knowledge entry in the SurrealDB store via a
`kn-` ID reference. This allows an agent to associate fast local state with
richer knowledge graph entries. The `--memory` flag on `get`, `last`, `since`,
`search`, `random`, and `dump` resolves these references and displays the
linked entry.

Memory links exist at two levels: key-level (one pointer per key) and
per-entry (one pointer per history or list entry). Per-entry links are set
via `push --memory` at creation time or `set --id --memory` on existing
entries. When resolving, per-entry memory wins over a legacy `kn-` value
prefix, which wins over the key-level fallback. The `SearchHit` struct
(returned by `last`, `random`, `search`, `since`, and `get --id`) carries the
per-entry `memory` field for the handler to resolve.

`SearchHit` derives `serde::Serialize` to support the `--json` output flag.
The serialized field names are the Rust struct names (`index`, `id`, `value`,
`ts`, `data`, `memory`) -- deliberately different from the on-disk
`serde(rename)` aliases used by `HistoryEntry` and `ListEntry`. The `data`
and `memory` fields use `#[serde(skip_serializing_if = "Option::is_none")]`
so they are omitted from JSON output when not set.


// =========================================================================
// BASE-D INTEGRATION
// =========================================================================

== Base-d integration <base-d-integration>

The `base-d` crate (version 3) provides the encoding layer. It is used in
three places:

=== `commit.rs` -- the encoding pipeline

When `mx commit` runs:

+ `get_staged_diff()` captures the output of `git diff --staged`.
+ `encode_hash_with_registry()` hashes the diff bytes with a random hash
  algorithm and encodes the hash through a random dictionary. This produces the
  commit title.
+ `encode_compress_with_registry()` compresses the commit message with a
  random compression algorithm and encodes the compressed bytes through a
  second random dictionary. This produces the commit body.
+ A footer tag is assembled: `[hash_algo:title_dict|compress_algo:body_dict]`.
+ If both dictionaries are the same (dejavu), the marker `whoa.` is appended.
+ All parts are validated for unsafe characters (NUL, C0/C1 controls). If
  validation fails, the entire encode is retried with freshly rolled
  dictionaries, up to 5 attempts.
+ `git_commit()` writes the three-part message (title, body, footer) as the
  commit message.

The `EncodedCommit` struct captures all parts:

```rust
pub struct EncodedCommit {
    pub title: String,
    pub body: String,
    pub footer: String,
    pub dejavu: bool,
    pub title_dict: String,
    pub body_dict: String,
}
```

=== `handlers/mod.rs` -- the decoding pipeline

`mx log` uses a four-phase architecture:

+ *Parse* -- raw CLI arguments (received as trailing varargs) are parsed into a
  structured `LogOptions` with separate fields for count, display mode
  (`Compact`, `Full`, `Oneline`, format presets, or custom format string), diff
  mode (`None`, `Stat`, `ShortStat`, `Patch`), decorate preference, and filter
  arguments. Custom `--format` strings and `--graph` are detected here and
  trigger a passthrough to raw `git log` with a stderr note.
+ *Harvest* -- a single `git log` call with a structured format string
  retrieves commit metadata (full hash, short hash, decorations, parents,
  author, date, committer, commit date, subject, body). Each commit body is
  decoded via `try_decode_commit_body()`.
+ *Attach diffs* -- if a diff mode was requested, a second `git log` call
  retrieves the diff output. Each diff block is matched to its corresponding
  commit by hash and attached as a string field.
+ *Render* -- the display mode selects a renderer. Each renderer prints the
  decoded message with the appropriate header format, followed by any attached
  diff output.

The `-n`/`--count` and `--full` flags are not clap-managed -- they are parsed
internally from the trailing varargs, following the same pattern as `mx show`.

`try_decode_commit_body()` scans for the last footer-shaped line (validated
against the known compression algorithm vocabulary). Everything above the
footer is the encoded payload; everything below is trailing content (dejavu
markers, user-appended notes). `commit::decode_body()` looks up the dictionary
from the footer, decodes, and decompresses. The scan uses a "last wins"
heuristic: if multiple footer-shaped lines appear (e.g., from a user-amended
commit that quotes a prior footer), the last one is used.

`handle_show()` uses a two-pass approach: Pass 1 retrieves commit metadata
and the encoded message (with `--no-patch`), decodes it, and prints the
header. Pass 2 retrieves the diff output (with `--format=""`) and streams it
as-is. Passthrough detection skips decoding entirely for `ref:path` syntax
(file content viewing) and `--format`/`--pretty` (user-controlled output).

=== `commit.rs` -- PR merge encoding

`mx pr merge` follows the same pipeline but sources the diff from
`gh pr diff` and the message from the PR title and body. The encoded message is
passed to `gh pr merge --subject ... --body ...`.

=== `knowledge.rs` -- content hashing

`KnowledgeEntry` uses base-d's hash encoding for content hashing (via
`base_d::hash` and `base_d::encode`), producing the `content_hash` field used
for change detection during seed/import operations.


// =========================================================================
// TESTING PATTERNS
// =========================================================================

== Testing patterns <testing-patterns>

=== The `_with()` seam

The primary testing pattern in the codebase is the `_with()` test seam
described in #link(<with-pattern>)[Path management]. Any function that reads
from the environment or calls `dirs::home_dir()` is split into:

- A `_with(...)` variant that takes all external inputs as parameters (pure
  function).
- A public wrapper that reads the environment and delegates.

Tests call the `_with` variant directly, avoiding all process-global state.
This means the test suite runs safely in parallel without `#[serial]` except
for the handful of tests that must observe the public wrapper's env-var
behavior.

=== `serial_test`

Tests that mutate process environment (e.g., clearing `MX_CLAUDE_PROJECTS_DIR`
to observe the default fallback) are marked with `#[serial]` from the
`serial_test` crate. These are a small minority -- the `_with()` pattern
eliminates the need for serialization in most cases.

=== `proptest`

The `proptest` crate is available in dev-dependencies for property-based
testing. It is used selectively where input domains are large (e.g., Unicode
boundary testing for `safe_truncate`).

=== Round-trip encoder tests

The `try_decode_commit_body_tests` module in `handlers/mod.rs` tests the
encode-decode round trip by calling `encode_commit()` with known inputs and
verifying that `try_decode_commit_body()` recovers the original message. An
`encode_until` helper retries encoding with different random dictionaries until
a predicate is satisfied (e.g., dejavu vs. non-dejavu), filtering out
dictionary/codec pairings that produce unsafe output or fail round-trip.

=== KV store tests

The KV engine uses the same `_with()` approach for path resolution
(`resolve_kv_path_with`). Store tests operate on temp directories and never
touch the user's real `~/.mx/kv/` state.

=== SurrealDB integration tests

The `surreal_db/tests.rs` module contains integration tests that open a
temporary embedded SurrealKV database, apply the schema, and exercise the full
`KnowledgeStore` trait surface. Each test gets an isolated database directory.