netsky-db 0.1.7

netsky observability database
Documentation

netsky-db

Pure Rust observability database for messages, ticks, sessions, workspaces, clone dispatches, harvests, tool calls, git operations, directives, token usage, and watchdog events.

Stack

  • OLTP: turso SQLite at ~/.netsky/meta.db, configured with WAL and a 10s busy timeout.
  • OLAP: DataFusion SQL over Arrow RecordBatch snapshots.
  • Bridge: SQLite JSON rows decode into typed Rust structs, then register as DataFusion MemTables.
  • Fallback: turso write failures append JSONL to ~/.netsky/logs/meta-db-errors-<date>.jsonl and return Ok.

Usage

use netsky_db::Db;
let db = Db::open()?;
db.migrate()?;
let rows = db.query("SELECT COUNT(*) AS n FROM messages")?;

CLI:

netsky query "SELECT source, COUNT(*) AS n FROM messages GROUP BY source"

Schema v6

  • messages: inbound and outbound bus, iMessage, email, iroh, and demo source messages.
  • cli_invocations: command, argv JSON, exit code, duration, and host.
  • crashes: crash kind, agent, and detail JSON.
  • ticks: ticker and watchdog tick detail JSON.
  • workspaces: workspace create/delete lifecycle.
  • sessions: agent up/down/note events.
  • clone_dispatches: clone lifecycle, brief metadata, branch, status, exit code, and detail JSON.
  • harvest_events: source branch, target branch, commit SHA, status, conflicts, and detail JSON.
  • communication_events: normalized iMessage/email/agent communications with source, tool, direction, IDs, handle, agent, body, status, and detail JSON.
  • mcp_tool_calls: source, tool, agent, start/end, duration, success, error, timeout-race flag, and request/response JSON.
  • git_operations: git operation, repo, branch, remote, from/to SHAs, status, and detail JSON.
  • owner_directives: source, chat ID, raw owner text, resolved action, agent, status, and detail JSON.
  • token_usage: session ID, agent, runtime, model, input/output/cached tokens, cost in USD micros, and detail JSON.
  • watchdog_events: watchdog event name, agent, severity, status, and detail JSON.
  • source_errors: io-serve source error counts by bounded class (timeout, auth_failure, rate_limit, network_error, protocol_error, not_found, permission_denied, unknown). Each write is count=1 by default; a future bucketing layer can flush rolled-up counts without a schema change.
  • iroh_events: iroh handshake events by bounded type (connect, evict, reconnect, handshake_refused). peer_id_hash is a SHA256-truncated (8 byte) hex hash of the raw NodeId. The raw id never lands in the analytics surface.
  • source_cursors: durable per-source cursor. (source TEXT PRIMARY KEY, cursor_value TEXT, updated_at TEXT). Shape is structured (not row_json) because the CLI needs to read cursor_value without a JSON parse.
  • events: per-source delivery log. (id AUTOINCREMENT, source, ts_utc, payload_json, delivery_status, reason). A row is inserted with delivery_status='pending' before the delivery adapter runs, then transitioned to 'delivered' or 'failed' afterward. reason carries the failure text on 'failed'.

Derived views

  • clone_lifetimes: SELECT id, agent_id, runtime, workspace, branch, status, ts_utc_start, ts_utc_end FROM clone_dispatches WHERE ts_utc_end IS NOT NULL. Registered at query time; ongoing dispatches self-enroll the moment their end timestamp is recorded.

Migration

SCHEMA_VERSION is 6. The migration path is forward-compatible: v1–v5 databases add source_cursors + events as empty structured tables and bump the meta / pragma version. The token_usage row format gained an optional runtime field; rows written by v4 binaries deserialize cleanly because the field is #[serde(default)].

Writer APIs

  • Db::record_message(MessageRecord): source message envelope.
  • Db::record_cli(...): CLI invocation metadata.
  • Db::record_crash(...): crash event.
  • Db::record_tick(...): ticker or watchdog tick.
  • Db::record_workspace(...): workspace lifecycle event.
  • Db::record_session(...): agent session event.
  • Db::record_clone_dispatch(CloneDispatchRecord): clone lifecycle and brief metadata.
  • Db::record_harvest_event(HarvestEventRecord): cherry-pick and harvest results.
  • Db::record_communication_event(CommunicationEventRecord): high-level communication audit events.
  • Db::record_mcp_tool_call(McpToolCallRecord): MCP tool timing, success, errors, and timeout races.
  • Db::record_git_operation(GitOperationRecord): local git mutations and pushes.
  • Db::record_owner_directive(OwnerDirectiveRecord): trusted owner directives and resolved actions.
  • Db::record_token_usage(TokenUsageRecord): model token and cost accounting; carries a runtime discriminator ("claude", "codex", ...).
  • Db::record_token_usage_batch(impl IntoIterator<Item=TokenUsageRecord>): bulk variant for ingest paths. Reserves a contiguous id range in one round-trip and writes the whole batch inside one connection + transaction; per-row writes open a fresh turso connection per call and are dominated by setup cost in tight loops.
  • Db::record_watchdog_event(WatchdogEventRecord): watchdog state transitions and escalations.
  • Db::record_source_error(SourceErrorRecord): io-serve source errors keyed by a bounded SourceErrorClass.
  • Db::record_iroh_event(IrohEventRecord): iroh handshake events keyed by a bounded IrohEventType; hash the raw NodeId with hash_peer_id before writing.
  • Db::read_source_cursor(source) -> Option<String> + Db::update_source_cursor(source, value) + Db::reset_source_cursor(source) + Db::list_source_cursors(): durable source-cursor surface.
  • Db::insert_event(source, ts_utc, payload_json) -> id + Db::update_event_delivery(id, EventStatus, reason) + Db::tail_events(source, limit): per-source event log with delivery lifecycle.

Claude token-usage auto-instrumentation

The writer surface exists, but in-process claude-SDK hook wiring is a phase 2 deliverable. Today, netsky ingest claude-sessions backfills token_usage from ~/.claude/projects/*.jsonl; the ingestor stamps runtime = "claude" automatically.

Watchdog JSONL trail

The watchdog writes a backend-independent event trail before any meta.db write. Files live at:

~/.netsky/logs/watchdog-events-<YYYY-MM-DD>.jsonl

Each line is one JSON object:

{
  "ts": "2026-04-17T00:00:00Z",
  "kind": "failed-revive",
  "detail": {
    "message": "[watchdog-event 2026-04-17T00:00:00Z] failed-revive: pid=42"
  }
}

Fields:

  • ts: UTC event timestamp in RFC 3339 form.
  • kind: watchdog event class. Legacy string events use the token after [watchdog-event <ts>] and before the first colon when present.
  • detail: event-specific JSON. Legacy string events store the original text under detail.message.

Operator readers:

netsky doctor
netsky watchdog events --limit 20
netsky watchdog events --since 6h --json

Query surface

netsky query registers every schema table in DataFusion:

netsky query "SELECT COUNT(*) FROM clone_dispatches"
netsky query "SELECT agent, SUM(input_tokens) FROM token_usage GROUP BY agent"

Compatibility

The Db::record_* writer API remains stable.