aicx 0.6.0

Operator CLI + MCP server: canonical corpus first, optional semantic index second (Claude Code, Codex, Gemini)
Documentation
# Architecture

`aicx` is the operator front door for agent session logs. It orchestrates a
two-layer pipeline — canonical corpus first, optional semantic index second:

1. **Canonical corpus** (layer 1, `~/.aicx/`): read local agent session logs,
   normalize into a single timeline schema, deduplicate, chunk into steerable
   markdown with frontmatter metadata. This is ground truth.
2. **Optional semantic index** (layer 2, memex): embed the canonical corpus into
   a vector + BM25 index for semantic retrieval by agents and MCP tools. Always
   operator-driven — nothing syncs automatically.

`aicx` owns the canonical corpus; memex is an optional semantic index layered on top.

The pipeline exposes chunks through CLI, MCP, dashboard search surfaces, and an
adjacent Vibecrafted artifact explorer for workflow/marbles reports.

```mermaid
flowchart TD
  CLI[aicx CLI] --> SRC[sources.rs: extract_*]
  SRC --> DEDUP[state.rs: dedup + watermark]
  DEDUP --> RED[redact.rs: redact_secrets]
  RED --> STORE[store.rs: write_context_chunked]
  STORE --> EMIT[stdout: --emit paths/json/none]
  RED --> LOCAL[output.rs: write_report (-o)]
  STORE --> MEMEX[memex.rs: sync_new_chunks (--memex)]
```

## Module Map (Codebase Mapping)

Library modules (see `src/lib.rs`):

- `src/sources.rs`: source discovery + extraction
- `src/state.rs`: dedup hashes + incremental watermarks
- `src/store.rs`: canonical store layout under `~/.aicx/` + `index.json`
- `src/chunker.rs`: semantic windowing chunker (token heuristic + overlap + highlight extraction)
- `src/output.rs`: local report writer (`-o`) + optional loctree snapshot inclusion
- `src/memex.rs`: memex materialization (in-process via `rmcp-memex` library) + sync state
- `src/redact.rs`: secret redaction (regex engine)
- `src/sanitize.rs`: path validation for reads/writes (defense against traversal)
- `src/steer_index.rs`: fast metadata index for steering-aware retrieval
- `src/reports_extractor.rs`: scans `~/.vibecrafted/artifacts` and renders a standalone HTML/JSON dossier for workflow and marbles artifact review

Binary orchestration:
- `src/main.rs`: clap CLI, wires flows together, handles stdout emission (`--emit`).

## Data Flow: Extractors (`claude`, `codex`, `all`)

High-level sequence (see `src/main.rs::run_extraction`):

1. Parse flags and build an `ExtractionConfig` (`src/sources.rs`).
2. Read session sources and parse events:
   - Claude: `~/.claude/projects/*/*.jsonl`
   - Codex: `~/.codex/history.jsonl`
   - Gemini: `~/.gemini/tmp/<hash>/chats/session-*.json`
   - Gemini Antigravity direct extract: `~/.gemini/antigravity/conversations/<uuid>.pb` or `~/.gemini/antigravity/brain/<uuid>/`
3. Normalize into timeline entries.
4. Deduplicate:
   - exact hash: `(agent, timestamp, message)`
   - overlap hash: `(timestamp_bucket_60s, message)` across agents
5. On corpus-building commands, redact secrets by default via `src/redact.rs`
   unless `--no-redact-secrets`.
6. Store-first chunking:
   - use the source-side `--project` filter only to narrow session discovery
   - then group the surviving entries by resolved repo identity `(repo-from-cwd, agent, date)`
   - chunk per group (~1500 tokens, overlap), write canonical `.md` chunks into `~/.aicx/store/` or `~/.aicx/non-repository-contexts/`
7. Stdout emission:
   - `--emit none` prints nothing (default for extractors and `store`)
   - `--emit paths` prints stored chunk paths, one per line
   - `--emit json` prints a single JSON payload including `store_paths`, `requested_source_filters`, and `resolved_store_buckets`
   - `--emit none` prints nothing
8. Optional local output (`-o`): write a report to the given directory.
9. Optional memex materialization (`--memex`): materialize canonical chunks into the optional memex semantic index (see note below).

Note on memex materialization:
- `--memex` reads from the same canonical chunk + sidecar store that the CLI, MCP, and dashboard use.
- Batch import and per-chunk upsert share the same metadata contract from `.meta.json` sidecars.
- Memex is an optional semantic index layered on top of the canonical store — not primary storage. Nothing materializes automatically.

Framework note:
- Repo-local `.ai-context/` artifacts are now owned by higher-level workflow tooling such as `/vc-init`, not by the retired `aicx init` flow.

## Frontmatter Steering Contract

Report files and chunk sidecars can include frontmatter metadata used for **steering** — targeted retrieval and selective re-entry by orchestration frameworks:

```yaml
---
agent: codex
run_id: mrbl-001
prompt_id: api-redesign_20260327
model: claude-3-5-sonnet
started_at: “2026-03-24T10:00:00Z”
completed_at: “2026-03-24T10:30:00Z”
token_usage: 125000
findings_count: 3
---
```

These fields are parsed by `src/frontmatter.rs`, applied during chunking, and persisted as `.meta.json` sidecars alongside each chunk file. The `steer` command (CLI), `aicx_steer` tool (MCP), and `/api/search/steer` endpoint (dashboard) allow retrieval by these fields without filesystem grep.

Frontmatter is not just telemetry — it is part of the steering and selective re-entry contract. Orchestration can use `run_id` to retrieve all chunks from a specific agent run, `prompt_id` to find outputs from a specific prompt, or combine filters to narrow scope precisely.

## Data Flow: `store`

`store` is the “build the canonical corpus from older history” command (see `src/main.rs::run_store`):

1. Extract selected agents + source filters for a lookback window.
2. Redact secrets (default).
3. Chunk and write into the canonical `~/.aicx/` store, which may resolve into multiple repo buckets plus `non-repository-contexts`.
4. Optional memex sync (`--memex`).

## MCP Surface (`src/mcp.rs`)

The MCP server exposes three tools via stdio and streamable HTTP transports:

- `aicx_search` — search stored chunks with quality scoring; widens with memex semantic retrieval when available and otherwise falls back to canonical-store fuzzy search
- `aicx_rank` — rank chunks by signal density for a project as compact JSON
- `aicx_steer` — retrieve chunks by steering metadata (run_id, prompt_id, agent, kind, project, date) using sidecar data; the primary metadata-aware retrieval path for orchestration

Recency filtering in `aicx_search` and `aicx_steer` uses canonical chunk dates from the store layout, not filesystem `mtime` accidents.

## Security Model (Pragmatic)

Two mechanisms protect your machine and your data:
- Path validation (read/write) in `src/sanitize.rs`.
- Best-effort secret redaction in `src/redact.rs` (enabled by default).

Redaction is conservative by design: it’s OK to over-redact sometimes; it’s not OK to leak tokens into committed artifacts. The flag lives only on corpus-building commands that create or rewrite artifacts, not on read-only search and steering surfaces.