# netsky-db
Pure Rust observability database for messages, ticks, sessions, workspaces, clone dispatches, harvests, tool calls, git operations, directives, token usage, and watchdog events.
## Stack
- OLTP: turso SQLite at `~/.netsky/meta.db`, configured with WAL and a 10s busy timeout.
- OLAP: DataFusion SQL over Arrow `RecordBatch` snapshots.
- Bridge: SQLite JSON rows decode into typed Rust structs, then register as DataFusion `MemTable`s.
- Fallback: turso write failures append JSONL to `~/.netsky/logs/meta-db-errors-<date>.jsonl` and return `Ok`.
## Usage
```rust
use netsky_db::Db;
let db = Db::open()?;
db.migrate()?;
let rows = db.query("SELECT COUNT(*) AS n FROM messages")?;
```
CLI:
```sh
netsky query "SELECT source, COUNT(*) AS n FROM messages GROUP BY source"
```
## Schema v6
- `messages`: inbound and outbound bus, iMessage, email, iroh, and demo source messages.
- `cli_invocations`: command, argv JSON, exit code, duration, and host.
- `crashes`: crash kind, agent, and detail JSON.
- `ticks`: ticker and watchdog tick detail JSON.
- `workspaces`: workspace create/delete lifecycle.
- `sessions`: agent up/down/note events.
- `clone_dispatches`: clone lifecycle, brief metadata, branch, status, exit code, and detail JSON.
- `harvest_events`: source branch, target branch, commit SHA, status, conflicts, and detail JSON.
- `communication_events`: normalized iMessage/email/agent communications with source, tool, direction, IDs, handle, agent, body, status, and detail JSON.
- `mcp_tool_calls`: source, tool, agent, start/end, duration, success, error, timeout-race flag, and request/response JSON.
- `git_operations`: git operation, repo, branch, remote, from/to SHAs, status, and detail JSON.
- `owner_directives`: source, chat ID, raw owner text, resolved action, agent, status, and detail JSON.
- `token_usage`: session ID, agent, runtime, model, input/output/cached tokens, cost in USD micros, and detail JSON.
- `watchdog_events`: watchdog event name, agent, severity, status, and detail JSON.
- `source_errors`: io-serve source error counts by bounded class (`timeout`, `auth_failure`, `rate_limit`, `network_error`, `protocol_error`, `not_found`, `permission_denied`, `unknown`). Each write is count=1 by default; a future bucketing layer can flush rolled-up counts without a schema change.
- `iroh_events`: iroh handshake events by bounded type (`connect`, `evict`, `reconnect`, `handshake_refused`). `peer_id_hash` is a SHA256-truncated (8 byte) hex hash of the raw NodeId. The raw id never lands in the analytics surface.
- `source_cursors`: durable per-source cursor. `(source TEXT PRIMARY KEY, cursor_value TEXT, updated_at TEXT)`. Shape is structured (not row_json) because the CLI needs to read `cursor_value` without a JSON parse.
- `events`: per-source delivery log. `(id AUTOINCREMENT, source, ts_utc, payload_json, delivery_status, reason)`. A row is inserted with `delivery_status='pending'` before the delivery adapter runs, then transitioned to `'delivered'` or `'failed'` afterward. `reason` carries the failure text on `'failed'`.
### Derived views
- `clone_lifetimes`: `SELECT id, agent_id, runtime, workspace, branch, status, ts_utc_start, ts_utc_end FROM clone_dispatches WHERE ts_utc_end IS NOT NULL`. Registered at query time; ongoing dispatches self-enroll the moment their end timestamp is recorded.
### Migration
`SCHEMA_VERSION` is `6`. The migration path is forward-compatible: v1–v5 databases add `source_cursors` + `events` as empty structured tables and bump the meta / pragma version. The `token_usage` row format gained an optional `runtime` field; rows written by v4 binaries deserialize cleanly because the field is `#[serde(default)]`.
## Writer APIs
- `Db::record_message(MessageRecord)`: source message envelope.
- `Db::record_cli(...)`: CLI invocation metadata.
- `Db::record_crash(...)`: crash event.
- `Db::record_tick(...)`: ticker or watchdog tick.
- `Db::record_workspace(...)`: workspace lifecycle event.
- `Db::record_session(...)`: agent session event.
- `Db::record_clone_dispatch(CloneDispatchRecord)`: clone lifecycle and brief metadata.
- `Db::record_harvest_event(HarvestEventRecord)`: cherry-pick and harvest results.
- `Db::record_communication_event(CommunicationEventRecord)`: high-level communication audit events.
- `Db::record_mcp_tool_call(McpToolCallRecord)`: MCP tool timing, success, errors, and timeout races.
- `Db::record_git_operation(GitOperationRecord)`: local git mutations and pushes.
- `Db::record_owner_directive(OwnerDirectiveRecord)`: trusted owner directives and resolved actions.
- `Db::record_token_usage(TokenUsageRecord)`: model token and cost accounting; carries a `runtime` discriminator ("claude", "codex", ...).
- `Db::record_token_usage_batch(impl IntoIterator<Item=TokenUsageRecord>)`: bulk variant for ingest paths. Reserves a contiguous id range in one round-trip and writes the whole batch inside one connection + transaction; per-row writes open a fresh turso connection per call and are dominated by setup cost in tight loops.
- `Db::record_watchdog_event(WatchdogEventRecord)`: watchdog state transitions and escalations.
- `Db::record_source_error(SourceErrorRecord)`: io-serve source errors keyed by a bounded `SourceErrorClass`.
- `Db::record_iroh_event(IrohEventRecord)`: iroh handshake events keyed by a bounded `IrohEventType`; hash the raw NodeId with `hash_peer_id` before writing.
- `Db::read_source_cursor(source) -> Option<String>` + `Db::update_source_cursor(source, value)` + `Db::reset_source_cursor(source)` + `Db::list_source_cursors()`: durable source-cursor surface.
- `Db::insert_event(source, ts_utc, payload_json) -> id` + `Db::update_event_delivery(id, EventStatus, reason)` + `Db::tail_events(source, limit)`: per-source event log with delivery lifecycle.
### Claude token-usage auto-instrumentation
The writer surface exists, but in-process claude-SDK hook wiring is a phase 2 deliverable. Today, `netsky ingest claude-sessions` backfills `token_usage` from `~/.claude/projects/*.jsonl`; the ingestor stamps `runtime = "claude"` automatically.
## Watchdog JSONL trail
The watchdog writes a backend-independent event trail before any `meta.db`
write. Files live at:
```text
~/.netsky/logs/watchdog-events-<YYYY-MM-DD>.jsonl
```
Each line is one JSON object:
```json
{
"ts": "2026-04-17T00:00:00Z",
"kind": "failed-revive",
"detail": {
"message": "[watchdog-event 2026-04-17T00:00:00Z] failed-revive: pid=42"
}
}
```
Fields:
- `ts`: UTC event timestamp in RFC 3339 form.
- `kind`: watchdog event class. Legacy string events use the token after
`[watchdog-event <ts>]` and before the first colon when present.
- `detail`: event-specific JSON. Legacy string events store the original
text under `detail.message`.
Operator readers:
```sh
netsky doctor
netsky watchdog events --limit 20
netsky watchdog events --since 6h --json
```
## Query surface
`netsky query` registers every schema table in DataFusion:
```sh
netsky query "SELECT COUNT(*) FROM clone_dispatches"
netsky query "SELECT agent, SUM(input_tokens) FROM token_usage GROUP BY agent"
```
## Compatibility
The `Db::record_*` writer API remains stable.