netsky-db 0.1.7

netsky observability database
Documentation
# netsky-db

Pure Rust observability database for messages, ticks, sessions, workspaces, clone dispatches, harvests, tool calls, git operations, directives, token usage, and watchdog events.

## Stack

- OLTP: turso SQLite at `~/.netsky/meta.db`, configured with WAL and a 10s busy timeout.
- OLAP: DataFusion SQL over Arrow `RecordBatch` snapshots.
- Bridge: SQLite JSON rows decode into typed Rust structs, then register as DataFusion `MemTable`s.
- Fallback: turso write failures append JSONL to `~/.netsky/logs/meta-db-errors-<date>.jsonl` and return `Ok`.

## Usage

```rust
use netsky_db::Db;
let db = Db::open()?;
db.migrate()?;
let rows = db.query("SELECT COUNT(*) AS n FROM messages")?;
```

CLI:

```sh
netsky query "SELECT source, COUNT(*) AS n FROM messages GROUP BY source"
```

## Schema v6

- `messages`: inbound and outbound bus, iMessage, email, iroh, and demo source messages.
- `cli_invocations`: command, argv JSON, exit code, duration, and host.
- `crashes`: crash kind, agent, and detail JSON.
- `ticks`: ticker and watchdog tick detail JSON.
- `workspaces`: workspace create/delete lifecycle.
- `sessions`: agent up/down/note events.
- `clone_dispatches`: clone lifecycle, brief metadata, branch, status, exit code, and detail JSON.
- `harvest_events`: source branch, target branch, commit SHA, status, conflicts, and detail JSON.
- `communication_events`: normalized iMessage/email/agent communications with source, tool, direction, IDs, handle, agent, body, status, and detail JSON.
- `mcp_tool_calls`: source, tool, agent, start/end, duration, success, error, timeout-race flag, and request/response JSON.
- `git_operations`: git operation, repo, branch, remote, from/to SHAs, status, and detail JSON.
- `owner_directives`: source, chat ID, raw owner text, resolved action, agent, status, and detail JSON.
- `token_usage`: session ID, agent, runtime, model, input/output/cached tokens, cost in USD micros, and detail JSON.
- `watchdog_events`: watchdog event name, agent, severity, status, and detail JSON.
- `source_errors`: io-serve source error counts by bounded class (`timeout`, `auth_failure`, `rate_limit`, `network_error`, `protocol_error`, `not_found`, `permission_denied`, `unknown`). Each write is count=1 by default; a future bucketing layer can flush rolled-up counts without a schema change.
- `iroh_events`: iroh handshake events by bounded type (`connect`, `evict`, `reconnect`, `handshake_refused`). `peer_id_hash` is a SHA256-truncated (8 byte) hex hash of the raw NodeId. The raw id never lands in the analytics surface.
- `source_cursors`: durable per-source cursor. `(source TEXT PRIMARY KEY, cursor_value TEXT, updated_at TEXT)`. Shape is structured (not row_json) because the CLI needs to read `cursor_value` without a JSON parse.
- `events`: per-source delivery log. `(id AUTOINCREMENT, source, ts_utc, payload_json, delivery_status, reason)`. A row is inserted with `delivery_status='pending'` before the delivery adapter runs, then transitioned to `'delivered'` or `'failed'` afterward. `reason` carries the failure text on `'failed'`.

### Derived views

- `clone_lifetimes`: `SELECT id, agent_id, runtime, workspace, branch, status, ts_utc_start, ts_utc_end FROM clone_dispatches WHERE ts_utc_end IS NOT NULL`. Registered at query time; ongoing dispatches self-enroll the moment their end timestamp is recorded.

### Migration

`SCHEMA_VERSION` is `6`. The migration path is forward-compatible: v1–v5 databases add `source_cursors` + `events` as empty structured tables and bump the meta / pragma version. The `token_usage` row format gained an optional `runtime` field; rows written by v4 binaries deserialize cleanly because the field is `#[serde(default)]`.

## Writer APIs

- `Db::record_message(MessageRecord)`: source message envelope.
- `Db::record_cli(...)`: CLI invocation metadata.
- `Db::record_crash(...)`: crash event.
- `Db::record_tick(...)`: ticker or watchdog tick.
- `Db::record_workspace(...)`: workspace lifecycle event.
- `Db::record_session(...)`: agent session event.
- `Db::record_clone_dispatch(CloneDispatchRecord)`: clone lifecycle and brief metadata.
- `Db::record_harvest_event(HarvestEventRecord)`: cherry-pick and harvest results.
- `Db::record_communication_event(CommunicationEventRecord)`: high-level communication audit events.
- `Db::record_mcp_tool_call(McpToolCallRecord)`: MCP tool timing, success, errors, and timeout races.
- `Db::record_git_operation(GitOperationRecord)`: local git mutations and pushes.
- `Db::record_owner_directive(OwnerDirectiveRecord)`: trusted owner directives and resolved actions.
- `Db::record_token_usage(TokenUsageRecord)`: model token and cost accounting; carries a `runtime` discriminator ("claude", "codex", ...).
- `Db::record_token_usage_batch(impl IntoIterator<Item=TokenUsageRecord>)`: bulk variant for ingest paths. Reserves a contiguous id range in one round-trip and writes the whole batch inside one connection + transaction; per-row writes open a fresh turso connection per call and are dominated by setup cost in tight loops.
- `Db::record_watchdog_event(WatchdogEventRecord)`: watchdog state transitions and escalations.
- `Db::record_source_error(SourceErrorRecord)`: io-serve source errors keyed by a bounded `SourceErrorClass`.
- `Db::record_iroh_event(IrohEventRecord)`: iroh handshake events keyed by a bounded `IrohEventType`; hash the raw NodeId with `hash_peer_id` before writing.
- `Db::read_source_cursor(source) -> Option<String>` + `Db::update_source_cursor(source, value)` + `Db::reset_source_cursor(source)` + `Db::list_source_cursors()`: durable source-cursor surface.
- `Db::insert_event(source, ts_utc, payload_json) -> id` + `Db::update_event_delivery(id, EventStatus, reason)` + `Db::tail_events(source, limit)`: per-source event log with delivery lifecycle.

### Claude token-usage auto-instrumentation

The writer surface exists, but in-process claude-SDK hook wiring is a phase 2 deliverable. Today, `netsky ingest claude-sessions` backfills `token_usage` from `~/.claude/projects/*.jsonl`; the ingestor stamps `runtime = "claude"` automatically.

## Watchdog JSONL trail

The watchdog writes a backend-independent event trail before any `meta.db`
write. Files live at:

```text
~/.netsky/logs/watchdog-events-<YYYY-MM-DD>.jsonl
```

Each line is one JSON object:

```json
{
  "ts": "2026-04-17T00:00:00Z",
  "kind": "failed-revive",
  "detail": {
    "message": "[watchdog-event 2026-04-17T00:00:00Z] failed-revive: pid=42"
  }
}
```

Fields:

- `ts`: UTC event timestamp in RFC 3339 form.
- `kind`: watchdog event class. Legacy string events use the token after
  `[watchdog-event <ts>]` and before the first colon when present.
- `detail`: event-specific JSON. Legacy string events store the original
  text under `detail.message`.

Operator readers:

```sh
netsky doctor
netsky watchdog events --limit 20
netsky watchdog events --since 6h --json
```

## Query surface

`netsky query` registers every schema table in DataFusion:

```sh
netsky query "SELECT COUNT(*) FROM clone_dispatches"
netsky query "SELECT agent, SUM(input_tokens) FROM token_usage GROUP BY agent"
```

## Compatibility

The `Db::record_*` writer API remains stable.