# jetro Semantics (v0.5.5)
Current reference for user-visible runtime behavior. This file should
describe stable language semantics, not optimization details. Backend,
cache, and structural-sharing notes are implementation details unless a
section explicitly says otherwise.
---
## 1. Value model
Six runtime value kinds — `null`, `bool`, `number`, `string`, `array`,
`object`. Internally `Val` carries cheap-clone variants for numeric and
string columns (`IntVec`, `FloatVec`, `StrVec`, `StrSliceVec`, `ObjVec`)
plus borrowed `StrSlice` over a tape; these are an implementation detail
and equate to the public kinds for comparison and dispatch.
| `null` | `"null"` | always falsy |
| `bool` | `"bool"` | `true` |
| `number` | `"number"` | non-zero |
| `string` | `"string"` | non-empty |
| `array` | `"array"` | non-empty |
| `object` | `"object"` | non-empty |
Numbers are unified `f64` at the surface; integer-valued numbers are
preserved on output (`1` not `1.0`).
`Arc`-shared compound values — clone is `O(1)` refcount bump. Untouched
subtrees in patch writes preserve their `Arc`.
---
## 2. Roots and current value
| `$` | Document root (the value passed into `collect`) |
| `@` | Current value at this position. Inside a path: the parent step's value. Inside method args: the element under consideration. Inside `lambda`/`=>` body: implicit when no name given. Top-level: same as `$`. |
Exactly one root per chain. Inside method arguments, paths must start
with `$`, `@`, or a bound name; bare-leading `.field` does not parse.
---
## 3. Path expressions
| Field | `.name`, `["foo bar"]` | Quoted form for tricky keys. |
| Index | `[3]`, `[-1]` | Negative indexes from the end. |
| Slice | `[a:b]`, `[a:b:c]`, `[::n]` | Half-open. `[::-1]` reverses. |
| Wildcard | `[*]` | Every element. |
| Filtered wildcard | `[* if pred]` | Wildcard restricted by predicate (`@` = element). |
| Descendant | `..name`, `..` | DFS pre-order. |
| Inline filter | `{cond}` | Sugar for `.filter(cond)`. |
| Dynamic key | `[expr]` | `expr` resolves to a key/index. |
| Quantifier | `step?`, `step!` | Optional / exactly-one. |
Bare field access **after** a filtered wildcard collapses to `null`:
`$.books[* if year > 1980].title` → `null`. Use a method instead:
`$.books[* if year > 1980].map(@.title)`.
`?` returns `null` instead of erroring when the step is missing. `!`
errors when zero or many results.
---
## 4. Output shape
A method call returns the value produced by that method. Path receivers do
not add a synthetic one-element wrapper around scalar method results.
```text
DOC: {"x": 10, "s": "hello", "xs": [1, 2, 3]}
$.x.type() → "number"
$.s.upper() → "HELLO"
$.x.to_json() → "10"
$.s.slice(0, 3) → "hel"
$.xs.map(@ + 1) → [2, 3, 4]
10.type() → "number"
"hello".upper() → "HELLO"
```
Array-returning methods (`map`, `filter`, `keys`, `entries`, `split`,
etc.) return arrays because that is the method's result, not because the
receiver was a path.
---
## 5. Method calls and arguments
### Three argument shapes
1. **Lambda** — `@`-form, named arrow, or `lambda` form, used by
`filter`, `find`, `map`, `sort`, `unique_by`, etc.
```text
$.users.filter(@.active)
$.users.filter(u => u.active)
$.users.filter(lambda u: u.active)
```
2. **Bare identifiers / aliases** — used by `pick`, `omit`. Identifiers
refer to fields of the receiver, not via `@`.
```text
$.user.pick(id, name) # identifiers
$.user.pick(uid: id) # alias: src
```
3. **Positional values / paths** — string literals, numbers, paths,
object literals. Used by `set_path`, `get_path`, `equi_join`,
`merge`, `update`, etc.
### Multi-arg lambdas
Two-arg lambdas use parens. Single-arg array destructure works:
```text
$.xs.accumulate(0, (a, b) => a + b)
$.entries.map(([k, v]) => {k, v})
```
### Pipe `|`
Pipes a value through a method with the previous stage's value bound to
`@`:
```text
```
Inside pipe, `set` / `modify` keep the v1 "return-the-new-value"
shape; outside pipe, chain-writes return the patched root.
---
## 6. Lambdas
| `@.expr` (implicit lambda) | `@` = current element |
| `name => body` | `name` bound to current element |
| `(a, b) => body` | Two-arg lambda |
| `([a, b]) => body` | One-arg array-destructure |
| `lambda x: body` | Named, Python-style |
Lambda forms lower to the same compiled body shape. Named and `@`-form emit
identical opcodes.
---
## 7. `let` bindings
`let name = expr in body` — binds `name` lexically inside `body`.
`expr` is evaluated once.
```text
let p = $.user.profile in f"{p.name} <{p.email}>"
let xs = $.numbers in [n*n for n in xs]
```
Useful to:
- avoid repeated path traversal,
- apply a chain to a value computed by an expression that isn't a
rooted path,
- destructure once, use many times.
---
## 8. Pattern match
`match expr with { pat -> body, ... }` — Maranget decision tree under
the hood.
| `42`, `"x"`, `true`, `null` | Literal match |
| `name` | Bind everything to `name` |
| `_` | Wildcard, no bind |
| `name: kind` | Match if `kind`; bind value (`s: string`, `n: number`, …) |
| `lo..hi`, `lo..=hi` | Numeric range (exclusive / inclusive) |
| `{k1: p1, k2: p2}` | Object pattern; reserved-word keys allowed |
| `[p1, p2]`, `[p1, p2, ...rest]` | Array pattern; rest binds tail |
| `pat when guard` | Guard outside the pattern |
Deep variants `..match` / `..match!` walk descendants in DFS pre-order.
`..match { ... }` collects every truthy arm-body result into an array;
unmatched descendants and falsy arm bodies are skipped. `..match! { ... }`
returns the first truthy arm-body result, or `null` when none matches.
```text
match $.book with {
{year: y} when y < 1970 -> "classic",
{year: y} when y < 2000 -> "modern",
_ -> "current"
}
```
Object-pattern key shorthand `{id, name}` is **not** supported in match
patterns — write `{id: id, name: name}`. Object rest uses `...*rest`;
`..rest` is not an object-pattern form.
---
## 9. f-strings
`f"...{expr}..."` — embeds expressions. Inside braces:
- bare expression: `{name}`, `{p.email}`
- format spec: `{x:.2}`, `{n:04d}`
Outside braces: regular string-literal escapes (`\n`, `\t`, `\\`,
`\"`, `\xNN`, `\uXXXX`).
---
## 10. Truthiness and comparison
### Truthiness
See §1. `0`, `""`, `[]`, `{}`, `null`, `false` are falsy; everything
else truthy.
### Equality
- Same kind: structural equality (deep on arrays and objects, ordered
keys).
- Cross-kind: not equal (no implicit coercion).
- `null == null` → `true`.
### Ordering
`<`, `<=`, `>`, `>=` are defined for the scalar cases used by the
runtime comparison operators, notably numbers and strings. Sort/reducer
internals have their own total comparison helpers for ordering compound
values; do not rely on object ordering through comparison operators as a
portable surface semantic.
---
## 11. Type coercion in operators
Arithmetic `+ - * / %`:
| number + number | number |
| string + string | string concat |
| string + scalar | string concat (scalar stringified) |
| array + array | concat |
| object + object | shallow merge (right wins) |
| Other combos | runtime error |
Boolean ops `and`, `or` short-circuit and return booleans. `not x`
returns the negation of truthiness.
Inequality: `!=` is the parsed form; `<>` is **not** supported.
`is` / `is not` for kind tests: `x is number`, `x is string`,
`x is array`, `x is object`, `x is null`, `x is bool`.
`as` for explicit casts: `"42" as int`, `1 as str`, `1 as bool`.
---
## 12. Membership
| `xs.includes(v)` | Method form (arrays/strings) |
| `xs has v` | Postfix operator |
Bare `v in xs` does **not** parse.
---
## 13. Comprehensions
```text
[expr for x in xs] # list
[expr for x in xs if c1 if c2] # multi-if (and-folded)
{k: v for [k, v] in pairs} # dict
{expr for x in xs} # set
(expr for x in xs) # generator
```
Source can be a path (`for x in $.items`) or any iterable. `for k, v in pairs`
and `for [k, v] in pairs` both work as 2-var destructure.
---
## 14. Writes
There are three write/update surfaces. Rooted chain writes and `patch`
blocks return the patched root. Builtin update methods have their own
method-specific return shape.
### 14.1 Chain-write terminals
Add a write method at the end of a `$`-rooted chain.
| `.set(v)` | Replace value at this path |
| `.modify(expr)` | Replace, with `@` = current value |
| `.delete()` | Remove the leaf |
| `.unset(key)` | Remove `key` from leaf object |
| `.merge({…})` | Shallow merge into leaf object |
| `.deep_merge({…})` | Recursive merge |
The classifier fires only when the chain base is `$`. Inside lambdas,
chain-writes remain regular method calls — this is how
`$.items.map(item => item | set(item.x + 1))` keeps the v1
"return-the-new-value" semantic.
`append(v)` and `prepend(v)` are ordinary array-returning methods today:
`$.vals.append(4)` returns the updated array, not the full patched root.
### 14.2 `patch $ { ... }` block
```text
patch $ {
user.name: "Ada",
user.tags: DELETE,
user.role: "admin" when $.user.verified,
users[*].active: true
}
```
| `path: value` | Write |
| `path: DELETE` | Remove |
| `path: value when cond` | Conditional write |
| `path[*]: value` | Broadcast over array |
`DELETE` is a sentinel, not a value — it cannot be stored in a binding.
### 14.3 Functional `.update`
Two separate update mechanisms exist:
```text
$.update("counters.visits", @ + 1) # builtin path update
$.books[*].update({tags: tags.append("test"), reviewed: true}) # selector + body
$.update({"books[*].tags": @.append("x"), active: false}) # root + quoted paths
```
The 2-arg `update(path, expr)` form is a regular builtin. The object-body
forms are first-class functional update batches (`UpdateBatch`) and are
planned as one batched write.
Body keys:
| `field: expr` | Write `expr` into `field` of each selected target |
| `"a.b.c": expr` | Nested path inside selected target |
| `"books[*].tags": expr` | Quoted root-relative path with wildcards / filters |
| `field: expr when cond` | Skip when `cond` is falsy |
| `field: DELETE` | Remove the field |
Properties:
- **Snapshot reads** — body sees pre-batch values, not partial mid-batch
state. Two ops on the same target both read original fields.
- **Order** — ops apply in source order, last write wins on overlap.
- **Selectors** — index, wildcard, filtered wildcard, nested chains.
- **Scalar promotion** — applying an object-body update to scalar
elements promotes them: `[1,2].update({seen: true})` →
`[{seen: true}, {seen: true}]`.
- **Untouched subtrees** — preserved by `Arc` sharing.
- **Empty body** — reserved as a no-op shape when accepted by the parser.
Object-body `.update` parses to its own AST node (`UpdateBatch`) so the
planner keeps the user-level shape for selector analysis, update-trie
construction, and materialization planning.
### 14.4 Fusion
Multiple compatible writes in a single query may route through patch
fusion or `UpdateBatch` planning so repeated path traversals can share a
write pass. Fusion is an optimization; semantics are source-order writes
with conservative fallback when a shape is unsafe to merge.
---
## 15. Demand model
Pull-based sinks describe input demand; eligible operators propagate that
demand backward to the source. When the source and intervening operators
support it, `.first()`, `.find(p)`, `.take(n)`, etc. can terminate early
or avoid decoding unused payload.
Concrete impact:
| `xs.first()` | Source reads 1 element |
| `xs.find(p)` | Source reads up to first match |
| `xs.filter(p).take(k)` | Stops after enough passing outputs |
| `xs.reverse().take(k)` | May become last-input demand |
| `xs.count()` | May avoid decoding payload |
Barriers such as full `sort`, `unique`, `group_by`, `accumulate`, and
`window` generally must see every element unless specific metadata proves
a bounded strategy is safe.
---
## 16. Backend selection
The physical planner attaches a backend preference list to each node.
The router tries preferred backends first and falls back to interpreted
execution when a backend declines the shape or lacks required byte/tape
capabilities.
The choice does not affect semantics — only performance. Any backend
that returns `Some(_)` is a sound implementation of the same operator.
---
## 17. Errors
Two classes:
- **`EvalError`** — runtime evaluation failure (type mismatch, missing
required field, division by zero).
- **`JetroEngineError`** — engine-only path that can additionally fail
on JSON parsing.
Errors abort the query. There is no implicit recovery.
`try` recovers: `try expr else fallback` — evaluates `expr`, returns
`fallback` on error.
---
## 18. Caches
- **Plan cache** (`JetroEngine`): `(query, ctx) → compiled pipeline`.
Default 256 entries, evicted wholesale.
- **Compile cache** (VM): expr → `Program`.
- **Path cache** (VM): resolved JSON pointer paths per document. Hash
key includes structure *and* primitive leaf values bounded at depth 8
— two docs with identical shape but different leaves stay distinct.
Caches are invisible at the language level; they affect throughput, not
results.
---
## 19. Builtin catalog
Builtins are dispatched through a static builtin-method registry and the
`Builtin` trait. `builtins/defs.rs` and `builtins/registry.rs` are the
authoritative planning/runtime metadata during the ongoing registry
migration.
Categories:
- Mapping: `map`, `flat_map`, `transform_keys`, `transform_values`
- Filtering: `filter`, `find`, `compact`, `take_while`, `drop_while`, `remove`
- Expanding: `flatten`, `lines`, `chars`, `entries`
- Reducers: `sum`, `count`, `any`, `all`, `min`, `max`, `min_by`, `max_by`, `count_by`
- Positional: `first`, `last`, `nth`, `find_one`, `collect`
- Barriers: `sort`, `sort_by`, `unique`, `unique_by`, `group_by`, `index_by`, `accumulate`, `partition`, `window`, `pairwise`, `chunk`
- Arrays / sets: `append`, `prepend`, `concat`, `diff`, `union`, `intersect`, `zip`, `zip_shape`
- Objects: `keys`, `values`, `entries`, `pick`, `omit`, `rename`, `merge`, `deep_merge`, `transform_*`, `flatten_keys`, `unflatten_keys`
- Path mutation: `get_path`, `set_path`, `del_path`, `has_path`, `set`, `update`
- Deep traversal: `deep_find`, `deep_shape`, `deep_like`, `rec`
- Predicates: `has`, `missing`, `includes`, `index`, `index_by`, `has_key`
- Tabular: `to_csv`, `to_tsv`
- Relational: `equi_join`
- String: `upper`, `lower`, `trim`, `pad_left`, `pad_right`, `slice`, `replace`, `replace_all`, `split`, `join`, `dedent`, `indent`, `re_match`, `re_find`, `re_replace`, `re_split`, `parse_int`, `parse_float`, `parse_json`, `from_json`, `to_json`
- Math: `abs`, `ceil`, `floor`, `round`, `sqrt`, `pow`, `log`, `exp`, `sin`, `cos`, …
- Statistics: `avg`, `mean`, `median`, `stddev`, `variance`, `zscore`, `cummax`, `cummin`, `lag`, `lead`, `pct_change`, `diff_window`, `approx_count_distinct`
- Type: `type`, `is`, `is not`, `as`
Aliases (`find_first`, `find_all`, `length`/`len`, `count`, …) lower to
the same builtin.
---
## 20. Reserved syntax
| `let`, `in`, `match`, `with`, `when`, `if`, `else`, `for` | Bindings, control flow, comprehensions |
| `lambda`, `as`, `is`, `not`, `and`, `or`, `try` | Lambda forms, casts, type tests, logic, error handling |
| `true`, `false`, `null` | Literals |
| `patch`, `DELETE` | Write block + sentinel |
| `has` | Postfix membership |
Comments: none. Strip client-side.
---
## Versioning
This document reflects jetro 0.5.5. Outstanding semantic gaps and
v0.5-restricted forms are listed in the book at
`reference/limitations.md`. As the engine catches up those entries
drop; this file follows.