jetro 0.5.12

Jetro - transform, query, and compare JSON
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
# jetro Semantics (v0.5.5)

Current reference for user-visible runtime behavior. This file should
describe stable language semantics, not optimization details. Backend,
cache, and structural-sharing notes are implementation details unless a
section explicitly says otherwise.

---

## 1. Value model

Six runtime value kinds — `null`, `bool`, `number`, `string`, `array`,
`object`. Internally `Val` carries cheap-clone variants for numeric and
string columns (`IntVec`, `FloatVec`, `StrVec`, `StrSliceVec`, `ObjVec`)
plus borrowed `StrSlice` over a tape; these are an implementation detail
and equate to the public kinds for comparison and dispatch.

| Kind | Type tag (`type()`) | Truthy iff |
|---|---|---|
| `null`   | `"null"`   | always falsy |
| `bool`   | `"bool"`   | `true` |
| `number` | `"number"` | non-zero |
| `string` | `"string"` | non-empty |
| `array`  | `"array"`  | non-empty |
| `object` | `"object"` | non-empty |

Numbers are unified `f64` at the surface; integer-valued numbers are
preserved on output (`1` not `1.0`).

`Arc`-shared compound values — clone is `O(1)` refcount bump. Untouched
subtrees in patch writes preserve their `Arc`.

---

## 2. Roots and current value

| Symbol | Meaning |
|---|---|
| `$` | Document root (the value passed into `collect`) |
| `@` | Current value at this position. Inside a path: the parent step's value. Inside method args: the element under consideration. Inside `lambda`/`=>` body: implicit when no name given. Top-level: same as `$`. |

Exactly one root per chain. Inside method arguments, paths must start
with `$`, `@`, or a bound name; bare-leading `.field` does not parse.

---

## 3. Path expressions

| Step | Form | Notes |
|---|---|---|
| Field | `.name`, `["foo bar"]` | Quoted form for tricky keys. |
| Index | `[3]`, `[-1]` | Negative indexes from the end. |
| Slice | `[a:b]`, `[a:b:c]`, `[::n]` | Half-open. `[::-1]` reverses. |
| Wildcard | `[*]` | Every element. |
| Filtered wildcard | `[* if pred]` | Wildcard restricted by predicate (`@` = element). |
| Descendant | `..name`, `..` | DFS pre-order. |
| Inline filter | `{cond}` | Sugar for `.filter(cond)`. |
| Dynamic key | `[expr]` | `expr` resolves to a key/index. |
| Quantifier | `step?`, `step!` | Optional / exactly-one. |

Bare field access **after** a filtered wildcard collapses to `null`:
`$.books[* if year > 1980].title` → `null`. Use a method instead:
`$.books[* if year > 1980].map(@.title)`.

`?` returns `null` instead of erroring when the step is missing. `!`
errors when zero or many results.

---

## 4. Output shape

A method call returns the value produced by that method. Path receivers do
not add a synthetic one-element wrapper around scalar method results.

```text
DOC:    {"x": 10, "s": "hello", "xs": [1, 2, 3]}
$.x.type()         → "number"
$.s.upper()        → "HELLO"
$.x.to_json()      → "10"
$.s.slice(0, 3)    → "hel"
$.xs.map(@ + 1)    → [2, 3, 4]

10.type()          → "number"
"hello".upper()    → "HELLO"
```

Array-returning methods (`map`, `filter`, `keys`, `entries`, `split`,
etc.) return arrays because that is the method's result, not because the
receiver was a path.

---

## 5. Method calls and arguments

### Three argument shapes

1. **Lambda**`@`-form, named arrow, or `lambda` form, used by
   `filter`, `find`, `map`, `sort`, `unique_by`, etc.
   ```text
   $.users.filter(@.active)
   $.users.filter(u => u.active)
   $.users.filter(lambda u: u.active)
   ```
2. **Bare identifiers / aliases** — used by `pick`, `omit`. Identifiers
   refer to fields of the receiver, not via `@`.
   ```text
   $.user.pick(id, name)         # identifiers
   $.user.pick(uid: id)          # alias: src
   ```
3. **Positional values / paths** — string literals, numbers, paths,
   object literals. Used by `set_path`, `get_path`, `equi_join`,
   `merge`, `update`, etc.

### Multi-arg lambdas

Two-arg lambdas use parens. Single-arg array destructure works:

```text
$.xs.accumulate(0, (a, b) => a + b)
$.entries.map(([k, v]) => {k, v})
```

### Pipe `|`

Pipes a value through a method with the previous stage's value bound to
`@`:

```text
"foo bar" | @.upper()       → "FOO BAR"
$.user | @.pick(id, name)
```

Inside pipe, `set` / `modify` keep the v1 "return-the-new-value"
shape; outside pipe, chain-writes return the patched root.

---

## 6. Lambdas

| Form | Body sees |
|---|---|
| `@.expr` (implicit lambda) | `@` = current element |
| `name => body` | `name` bound to current element |
| `(a, b) => body` | Two-arg lambda |
| `([a, b]) => body` | One-arg array-destructure |
| `lambda x: body` | Named, Python-style |

Lambda forms lower to the same compiled body shape. Named and `@`-form emit
identical opcodes.

---

## 7. `let` bindings

`let name = expr in body` — binds `name` lexically inside `body`.
`expr` is evaluated once.

```text
let p = $.user.profile in f"{p.name} <{p.email}>"
let xs = $.numbers in [n*n for n in xs]
```

Useful to:
- avoid repeated path traversal,
- apply a chain to a value computed by an expression that isn't a
  rooted path,
- destructure once, use many times.

---

## 8. Pattern match

`match expr with { pat -> body, ... }` — Maranget decision tree under
the hood.

| Pattern | Meaning |
|---|---|
| `42`, `"x"`, `true`, `null` | Literal match |
| `name` | Bind everything to `name` |
| `_` | Wildcard, no bind |
| `name: kind` | Match if `kind`; bind value (`s: string`, `n: number`, …) |
| `lo..hi`, `lo..=hi` | Numeric range (exclusive / inclusive) |
| `{k1: p1, k2: p2}` | Object pattern; reserved-word keys allowed |
| `[p1, p2]`, `[p1, p2, ...rest]` | Array pattern; rest binds tail |
| `pat when guard` | Guard outside the pattern |

Deep variants `..match` / `..match!` walk descendants in DFS pre-order.
`..match { ... }` collects every truthy arm-body result into an array;
unmatched descendants and falsy arm bodies are skipped. `..match! { ... }`
returns the first truthy arm-body result, or `null` when none matches.

```text
match $.book with {
  {year: y} when y < 1970 -> "classic",
  {year: y} when y < 2000 -> "modern",
  _ -> "current"
}
```

Object-pattern key shorthand `{id, name}` is **not** supported in match
patterns — write `{id: id, name: name}`. Object rest uses `...*rest`;
`..rest` is not an object-pattern form.

---

## 9. f-strings

`f"...{expr}..."` — embeds expressions. Inside braces:
- bare expression: `{name}`, `{p.email}`
- format spec: `{x:.2}`, `{n:04d}`

Outside braces: regular string-literal escapes (`\n`, `\t`, `\\`,
`\"`, `\xNN`, `\uXXXX`).

---

## 10. Truthiness and comparison

### Truthiness
See §1. `0`, `""`, `[]`, `{}`, `null`, `false` are falsy; everything
else truthy.

### Equality
- Same kind: structural equality (deep on arrays and objects, ordered
  keys).
- Cross-kind: not equal (no implicit coercion).
- `null == null``true`.

### Ordering
`<`, `<=`, `>`, `>=` are defined for the scalar cases used by the
runtime comparison operators, notably numbers and strings. Sort/reducer
internals have their own total comparison helpers for ordering compound
values; do not rely on object ordering through comparison operators as a
portable surface semantic.

---

## 11. Type coercion in operators

Arithmetic `+ - * / %`:

| L op R | Result |
|---|---|
| number + number | number |
| string + string | string concat |
| string + scalar | string concat (scalar stringified) |
| array + array | concat |
| object + object | shallow merge (right wins) |
| Other combos | runtime error |

Boolean ops `and`, `or` short-circuit and return booleans. `not x`
returns the negation of truthiness.

Inequality: `!=` is the parsed form; `<>` is **not** supported.

`is` / `is not` for kind tests: `x is number`, `x is string`,
`x is array`, `x is object`, `x is null`, `x is bool`.

`as` for explicit casts: `"42" as int`, `1 as str`, `1 as bool`.

---

## 12. Membership

| Form | Meaning |
|---|---|
| `xs.includes(v)` | Method form (arrays/strings) |
| `xs has v` | Postfix operator |

Bare `v in xs` does **not** parse.

---

## 13. Comprehensions

```text
[expr for x in xs]                        # list
[expr for x in xs if c1 if c2]            # multi-if (and-folded)
{k: v for [k, v] in pairs}                # dict
{expr for x in xs}                        # set
(expr for x in xs)                        # generator
```

Source can be a path (`for x in $.items`) or any iterable. `for k, v in pairs`
and `for [k, v] in pairs` both work as 2-var destructure.

---

## 14. Writes

There are three write/update surfaces. Rooted chain writes and `patch`
blocks return the patched root. Builtin update methods have their own
method-specific return shape.

### 14.1 Chain-write terminals

Add a write method at the end of a `$`-rooted chain.

| Method | Effect |
|---|---|
| `.set(v)` | Replace value at this path |
| `.modify(expr)` | Replace, with `@` = current value |
| `.delete()` | Remove the leaf |
| `.unset(key)` | Remove `key` from leaf object |
| `.merge({…})` | Shallow merge into leaf object |
| `.deep_merge({…})` | Recursive merge |

The classifier fires only when the chain base is `$`. Inside lambdas,
chain-writes remain regular method calls — this is how
`$.items.map(item => item | set(item.x + 1))` keeps the v1
"return-the-new-value" semantic.

`append(v)` and `prepend(v)` are ordinary array-returning methods today:
`$.vals.append(4)` returns the updated array, not the full patched root.

### 14.2 `patch $ { ... }` block

```text
patch $ {
  user.name: "Ada",
  user.tags: DELETE,
  user.role: "admin" when $.user.verified,
  users[*].active: true
}
```

| Clause | Meaning |
|---|---|
| `path: value` | Write |
| `path: DELETE` | Remove |
| `path: value when cond` | Conditional write |
| `path[*]: value` | Broadcast over array |

`DELETE` is a sentinel, not a value — it cannot be stored in a binding.

### 14.3 Functional `.update`

Two separate update mechanisms exist:

```text
$.update("counters.visits", @ + 1)                              # builtin path update
$.books[*].update({tags: tags.append("test"), reviewed: true})  # selector + body
$.update({"books[*].tags": @.append("x"), active: false})       # root + quoted paths
```

The 2-arg `update(path, expr)` form is a regular builtin. The object-body
forms are first-class functional update batches (`UpdateBatch`) and are
planned as one batched write.

Body keys:

| Form | Meaning |
|---|---|
| `field: expr` | Write `expr` into `field` of each selected target |
| `"a.b.c": expr` | Nested path inside selected target |
| `"books[*].tags": expr` | Quoted root-relative path with wildcards / filters |
| `field: expr when cond` | Skip when `cond` is falsy |
| `field: DELETE` | Remove the field |

Properties:
- **Snapshot reads** — body sees pre-batch values, not partial mid-batch
  state. Two ops on the same target both read original fields.
- **Order** — ops apply in source order, last write wins on overlap.
- **Selectors** — index, wildcard, filtered wildcard, nested chains.
- **Scalar promotion** — applying an object-body update to scalar
  elements promotes them: `[1,2].update({seen: true})`  `[{seen: true}, {seen: true}]`.
- **Untouched subtrees** — preserved by `Arc` sharing.
- **Empty body** — reserved as a no-op shape when accepted by the parser.

Object-body `.update` parses to its own AST node (`UpdateBatch`) so the
planner keeps the user-level shape for selector analysis, update-trie
construction, and materialization planning.

### 14.4 Fusion

Multiple compatible writes in a single query may route through patch
fusion or `UpdateBatch` planning so repeated path traversals can share a
write pass. Fusion is an optimization; semantics are source-order writes
with conservative fallback when a shape is unsafe to merge.

---

## 15. Demand model

Pull-based sinks describe input demand; eligible operators propagate that
demand backward to the source. When the source and intervening operators
support it, `.first()`, `.find(p)`, `.take(n)`, etc. can terminate early
or avoid decoding unused payload.

Concrete impact:

| Pattern | Saving |
|---|---|
| `xs.first()` | Source reads 1 element |
| `xs.find(p)` | Source reads up to first match |
| `xs.filter(p).take(k)` | Stops after enough passing outputs |
| `xs.reverse().take(k)` | May become last-input demand |
| `xs.count()` | May avoid decoding payload |

Barriers such as full `sort`, `unique`, `group_by`, `accumulate`, and
`window` generally must see every element unless specific metadata proves
a bounded strategy is safe.

---

## 16. Backend selection

The physical planner attaches a backend preference list to each node.
The router tries preferred backends first and falls back to interpreted
execution when a backend declines the shape or lacks required byte/tape
capabilities.

The choice does not affect semantics — only performance. Any backend
that returns `Some(_)` is a sound implementation of the same operator.

---

## 17. Errors

Two classes:
- **`EvalError`** — runtime evaluation failure (type mismatch, missing
  required field, division by zero).
- **`JetroEngineError`** — engine-only path that can additionally fail
  on JSON parsing.

Errors abort the query. There is no implicit recovery.

`try` recovers: `try expr else fallback` — evaluates `expr`, returns
`fallback` on error.

---

## 18. Caches

- **Plan cache** (`JetroEngine`): `(query, ctx) → compiled pipeline`.
  Default 256 entries, evicted wholesale.
- **Compile cache** (VM): expr → `Program`.
- **Path cache** (VM): resolved JSON pointer paths per document. Hash
  key includes structure *and* primitive leaf values bounded at depth 8
  — two docs with identical shape but different leaves stay distinct.

Caches are invisible at the language level; they affect throughput, not
results.

---

## 19. Builtin catalog

Builtins are dispatched through a static builtin-method registry and the
`Builtin` trait. `builtins/defs.rs` and `builtins/registry.rs` are the
authoritative planning/runtime metadata during the ongoing registry
migration.

Categories:
- Mapping: `map`, `flat_map`, `transform_keys`, `transform_values`
- Filtering: `filter`, `find`, `compact`, `take_while`, `drop_while`, `remove`
- Expanding: `flatten`, `lines`, `chars`, `entries`
- Reducers: `sum`, `count`, `any`, `all`, `min`, `max`, `min_by`, `max_by`, `count_by`
- Positional: `first`, `last`, `nth`, `find_one`, `collect`
- Barriers: `sort`, `sort_by`, `unique`, `unique_by`, `group_by`, `index_by`, `accumulate`, `partition`, `window`, `pairwise`, `chunk`
- Arrays / sets: `append`, `prepend`, `concat`, `diff`, `union`, `intersect`, `zip`, `zip_shape`
- Objects: `keys`, `values`, `entries`, `pick`, `omit`, `rename`, `merge`, `deep_merge`, `transform_*`, `flatten_keys`, `unflatten_keys`
- Path mutation: `get_path`, `set_path`, `del_path`, `has_path`, `set`, `update`
- Deep traversal: `deep_find`, `deep_shape`, `deep_like`, `rec`
- Predicates: `has`, `missing`, `includes`, `index`, `index_by`, `has_key`
- Tabular: `to_csv`, `to_tsv`
- Relational: `equi_join`
- String: `upper`, `lower`, `trim`, `pad_left`, `pad_right`, `slice`, `replace`, `replace_all`, `split`, `join`, `dedent`, `indent`, `re_match`, `re_find`, `re_replace`, `re_split`, `parse_int`, `parse_float`, `parse_json`, `from_json`, `to_json`
- Math: `abs`, `ceil`, `floor`, `round`, `sqrt`, `pow`, `log`, `exp`, `sin`, `cos`, …
- Statistics: `avg`, `mean`, `median`, `stddev`, `variance`, `zscore`, `cummax`, `cummin`, `lag`, `lead`, `pct_change`, `diff_window`, `approx_count_distinct`
- Type: `type`, `is`, `is not`, `as`

Aliases (`find_first`, `find_all`, `length`/`len`, `count`, …) lower to
the same builtin.

---

## 20. Reserved syntax

| Keyword | Use |
|---|---|
| `let`, `in`, `match`, `with`, `when`, `if`, `else`, `for` | Bindings, control flow, comprehensions |
| `lambda`, `as`, `is`, `not`, `and`, `or`, `try` | Lambda forms, casts, type tests, logic, error handling |
| `true`, `false`, `null` | Literals |
| `patch`, `DELETE` | Write block + sentinel |
| `has` | Postfix membership |

Comments: none. Strip client-side.

---

## Versioning

This document reflects jetro 0.5.5. Outstanding semantic gaps and
v0.5-restricted forms are listed in the book at
`reference/limitations.md`. As the engine catches up those entries
drop; this file follows.