jetro 0.3.0

Jetro - transform, query, and compare JSON
Documentation
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Commands

```bash
# Build
cargo build

# All tests
cargo test

# Single test by name
cargo test tests::tests::field_access
cargo test db::tests

# Run with output (useful when debugging)
cargo test some_test_name -- --nocapture
```

## Architecture

One query engine plus a persistence layer, all in `src/`.

### Expression engine — root of `src/`

- `grammar.pest` + `parser.rs` — PEG grammar parsed by [pest]https://pest.rs; produces an `Expr` AST (in `ast.rs`)
- `eval/mod.rs` — tree-walking evaluator; `Env` holds vars as `SmallVec<[(Arc<str>, Val); 4]>` + `Arc<MethodRegistry>`; entry points are `evaluate()` and `evaluate_with()`
- `eval/value.rs``Val` type: `Arc`-wrapped compound nodes (`Arr(Arc<Vec<Val>>)`, `Obj(Arc<IndexMap<Arc<str>, Val>>)`) — every clone is O(1)
- `eval/func_*.rs` — built-in method implementations grouped by category (strings, arrays, objects, paths, aggregates, csv)
- `eval/methods.rs``Method` trait + `MethodRegistry` for user-registered custom methods
- `vm.rs` — bytecode compiler + stack machine; compiles `Expr``Program` (`Arc<[Opcode]>`); peephole passes include `RootChain` fusion, `FilterCount` fusion, `ConstFold`; caches compiled programs and resolved pointer paths; `VM` owns both caches
- `graph.rs` — multi-document query using the VM; merges named nodes into a virtual root `{node: value}`, then evaluates
- `analysis.rs` / `schema.rs` / `plan.rs` / `cfg.rs` / `ssa.rs` — optional IR / analysis layers: type + nullness + cardinality, shape inference, logical plan, basic-block CFG, SSA numbering. None are mandatory for correctness; the tree-walker is the reference.

**Syntax:** `$.field.subfield`, `$.books.filter(price > 10)`, `$.books.map(title)`, comprehensions, `let` bindings, pipelines with `|`. Root is `$`.

Top-level entry points in `lib.rs`:
- `Jetro::new(doc).collect(expr)` — thread-local VM, cached across calls
- `jetro::query(expr, doc)` / `jetro::query_with(expr, doc, registry)` — one-shot via tree-walker

### `db/` — persistence layer

On-disk B+ tree storage backed by memory-mapped files (`memmap2`). Key files:

- `btree.rs` — B+ tree with COW writes and lock-free reads (bbolt-inspired); reads take a snapshot (`Arc<Mmap>` clone, O(1)) and hold no locks during traversal
- `bucket.rs``ExprBucket` (key → expression string) and `JsonBucket` (inserts pre-apply stored expressions and persist results)
- `graph_bucket.rs``GraphBucket`: cross-document queries across named node BTrees with secondary indexes
- `link_bucket.rs``LinkBucket`: stream-join / blocking bucket; a "link" is complete when one document of every registered `kind` has been inserted with the same id; `get`/`query` block until complete
- `mod.rs``Database::open(dir)` facade that opens/creates buckets

## Key implementation notes

**`shift_remove` not `remove`**: `IndexMap::remove()` is deprecated; always use `shift_remove()` to preserve insertion order.

**Val cloning is cheap**: `Val::Arr` and `Val::Obj` are `Arc`-wrapped. Cloning bumps a refcount, not the data. Use `Arc::try_unwrap` + fallback clone when you need mutability.

**`MethodRegistry` is `Clone`** (derives it): `Arc<MethodRegistry>` can be passed into `Env` without copying method implementations (they are `Arc<dyn Method>`).

**Tree-walker vs VM**: The tree-walker in `eval/mod.rs` is the reference implementation and is what `query()` uses. The `VM` in `vm.rs` is an accelerated layer on top with compile and resolution caches; use `VM::run_str()` / `VM::execute()` directly for repeated queries against the same or similar documents. `Jetro::collect` routes through the thread-local VM.

**Method dispatch**: Methods are called with dot syntax (`$.books.filter(price > 10)`), not pipe syntax. The `|` pipeline operator passes values through expressions, not method calls with arguments.

**Path cache key**: `VM::execute` seeds `doc_hash` via `hash_val_structure`, which hashes both structure *and* primitive values (arr/obj bounded by depth 8). Two docs with identical shape but different leaf values must produce different hashes — otherwise the pointer cache returns stale results across distinct docs.