jetro 0.5.12

Jetro - transform, query, and compare JSON
Documentation
# `has` array-RHS overload — implementation spec

Status: proposal, unimplemented.
Owner: TBD.
Affected crates: `jetro-core` (parser, parse tests, builtin ops).

## 1. Problem

`lhs has rhs` is parsed today (`parse/parser.rs::parse_contains`, lines 531-544)
as syntactic sugar for `lhs.includes(rhs)`. `includes` is single-needle
membership — it expects `rhs` to be a scalar.

When a user writes an array literal on the right:

```text
$..books!.find_all(@.attribute_code has ["a", "b"])
```

the predicate silently degenerates:

- If `@.attribute_code` is a **string**: `includes_apply` (ops/misc.rs:162)
  hits the `Val::Str` arm and computes `s.contains(item.as_str().unwrap_or_default())`.
  `as_str()` on `Val::Arr` returns `None`, `unwrap_or_default()` yields `""`,
  every string contains `""`, predicate is `true` for every row, `find_all`
  returns the entire input. This is the bug we want to fix.
- If `@.attribute_code` is an **array**: `val_to_key` serialises the literal
  to the JSON string `["a","b"]`, no array element matches that string, the
  predicate is `false` for every row.

Either branch is wrong relative to user intent ("the field contains both `a`
and `b`"). The string branch is the dangerous one because failure is silent.

## 2. Proposed semantics

Make `has` accept an array literal on the right and dispatch to a containment
check that means "every element of RHS is present in LHS". Keep existing
scalar-RHS behaviour unchanged.

| Receiver type      | `lhs has scalar` (today)              | `lhs has [a, b, ...]` (new)                     |
|--------------------|---------------------------------------|-------------------------------------------------|
| `Arr` / typed vec  | element equality (`includes`)         | every literal appears as an element (subset)    |
| `Str` / `StrSlice` | substring containment (`includes`)    | every needle is a substring                     |
| `Obj` / `ObjSmall` | key existence (`includes`)            | every key exists                                |
| other              | `false`                               | `false`                                         |

Design rules:

1. Overload `has`, do not introduce a second keyword. `in` stays reserved for
   `for x in xs` / `let ... in ...` (grammar.pest:14, 185-215).
2. The overload triggers only when the RHS *is syntactically an array literal*
   (`Expr::Array(...)`) at parse time. A runtime array produced by a
   sub-expression keeps `includes` semantics — we do not want runtime branching
   inside `includes` itself, and we do not want to make `has` polymorphic on a
   value whose shape isn't visible to the planner.
3. RHS element types must be scalar literals (`Val::Str`, `Val::Int`, `Val::Float`,
   `Val::Bool`, `Val::Null`). Reject non-literal or nested array elements at
   parse time with a clear error — do not silently fall through to `includes`.
4. Empty RHS (`lhs has []`) is `true` (vacuous truth — every element of the
   empty set is present). Document this explicitly in the parser test.

## 3. Parser change

File: `jetro-core/src/parse/parser.rs`, function `parse_contains`
(currently lines 531-544).

Today:

```rust
fn parse_contains(pair: Pair<Rule>) -> Expr {
    let mut inner = pair.into_inner();
    let lhs = parse_expr(inner.next().unwrap());
    match inner.next() {
        None => lhs,
        Some(_op_pair) => {
            let rhs = parse_expr(inner.next().unwrap());
            Expr::Chain(
                Box::new(lhs),
                vec![Step::Method("includes".to_string(), vec![Arg::Pos(rhs)])],
            )
        }
    }
}
```

Change: when `rhs` is `Expr::Array(elems)` and every element is a literal,
lower the chain step to a new builtin `has_all` taking the literal vector as
its argument. Otherwise keep the existing `includes` lowering.

Pseudocode:

```rust
Some(_op_pair) => {
    let rhs = parse_expr(inner.next().unwrap());
    let step = match &rhs {
        Expr::Array(elems) if elems.iter().all(is_scalar_literal) => {
            let lits = elems.iter().map(literal_to_val).collect::<Vec<_>>();
            Step::Method("has_all".into(), vec![Arg::Pos(Expr::Literal(Val::arr(lits)))])
        }
        Expr::Array(_) => {
            // Non-literal element inside `has [...]`. Reject loudly.
            return Expr::ParseError(
                "has [...] requires scalar literal elements; \
                 use contains_all/contains_any for dynamic arrays".into(),
            );
        }
        _ => Step::Method("includes".into(), vec![Arg::Pos(rhs)]),
    };
    Expr::Chain(Box::new(lhs), vec![step])
}
```

Helpers to add in the same file (private):

- `fn is_scalar_literal(e: &Expr) -> bool``true` for `Expr::Literal` whose
  inner `Val` is `Str | Int | Float | Bool | Null`.
- `fn literal_to_val(e: &Expr) -> Val` — unwrap the literal; panic with a
  clear message if called on a non-literal (the call site already filtered).

Check whether the codebase has an `Expr::ParseError` variant. If not, emit the
error through the parser's existing error channel (look at how other lowering
failures are surfaced — e.g. `classify_chain_write` in `parse/write_terminal.rs`
for a precedent of raising parse-time errors during AST rewriting).

## 4. Builtin: `has_all`

Two implementation strategies. Pick **B** unless there is a reason to prefer A.

### A. Reuse existing builtins (parser-only change)

Lower to different existing builtins depending on a syntactic guess:
`contains_all` if the elements are strings, a synthesised `.includes(x) &&
.includes(y) && ...` chain for arrays. Fragile because the planner can't see
the receiver type at parse time and `contains_all` is string-only
(`ops/regex.rs:186-189`). Rejected.

### B. New builtin `has_all` (recommended)

Add a single builtin that does the right thing per receiver type at runtime.

#### B.1 Builtin definition

File: `jetro-core/src/builtins/defs.rs`. Add a new struct in the same style as
`Has` (lines 3469-3494):

```rust
/// `has_all([a, b, ...])` — every literal in the argument is present in
/// the receiver. Arrays: element equality. Strings: substring. Objects:
/// key existence. Empty argument: always `true`. Returns `Val::Bool`.
pub(crate) struct HasAll;
impl Builtin for HasAll {
    const METHOD: BuiltinMethod = BuiltinMethod::HasAll;
    const NAME: &'static str = "has_all";
    fn spec() -> BuiltinSpec {
        BuiltinSpec::new(BuiltinCategory::Scalar, BuiltinCardinality::OneToOne)
            .indexed()
            .view_native()
            .demand_law(BuiltinDemandLaw::MapLike)
            .order_effect(BuiltinPipelineOrderEffect::Preserves)
    }
    #[inline]
    fn apply_args(
        recv: &crate::data::value::Val,
        args: &super::BuiltinArgs,
    ) -> Option<crate::data::value::Val> {
        match args {
            super::BuiltinArgs::Val(v) => super::has_all_apply(recv, v),
            _ => None,
        }
    }
}
```

The `BuiltinMethod::HasAll` variant must be registered:

1. Add `HasAll` to the `BuiltinMethod` enum in `builtins/mod.rs`.
2. Add `HasAll` to the `for_each_builtin!` macro list in `builtins/mod.rs`.

Follow the existing `Has` registration as a template — both registrations
appear next to each other.

Do not add an alias for `has_all`. Users should reach it only through the
`has [...]` sugar; exposing it as a dotted method is a separate decision.
(If we later expose it, do it intentionally with a CHANGELOG note.)

#### B.2 Runtime helper

File: `jetro-core/src/builtins/ops/path.rs` (next to `has_apply` at line 258).

```rust
/// Returns `Val::Bool(true)` when every element of `needles` is present
/// in `recv`. Receiver dispatch mirrors `has_apply`:
/// - `Arr` / typed vecs: element equality (string-coerced).
/// - `Str` / `StrSlice`: substring containment per needle.
/// - `Obj` / `ObjSmall`: key existence per needle.
/// - other: `None` (caller falls back to receiver-passthrough).
///
/// Empty `needles` returns `Val::Bool(true)` (vacuous truth).
#[inline]
pub fn has_all_apply(recv: &Val, needles: &Val) -> Option<Val> {
    let Val::Arr(items) = needles else { return None };
    if items.is_empty() {
        return Some(Val::Bool(true));
    }
    let all_present = items.iter().all(|n| {
        let key = crate::util::val_to_key(n);
        matches!(has_apply(recv, &key), Some(Val::Bool(true)))
    });
    Some(Val::Bool(all_present))
}
```

This delegates per-needle to the existing `has_apply` so receiver-type rules
stay in one place. Cost is O(N·M) but N (needles) is statically bounded by
the array literal, so this is fine.

If profiling shows the per-needle `val_to_key` allocation matters, pre-convert
the literal array once at parse time into a `Val::StrVec` and add a fast path
that takes `BuiltinArgs::StrVec`. Out of scope for the initial change.

## 5. Tests

All tests live under `jetro-core/src/tests/`. Add a new file
`tests/has_array_rhs.rs` (or extend the existing `has` tests if there is a
matching file — grep for `has_apply` / `fn test_has` first).

Required cases:

1. **String receiver, all needles present**   `{"a":"hello world"}` with `$.a has ["hello","world"]``true`.
2. **String receiver, one needle missing**   `$.a has ["hello","xyz"]``false`.
3. **Array receiver, subset**   `{"a":["x","y","z"]}` with `$.a has ["x","y"]``true`.
4. **Array receiver, not a subset**   `$.a has ["x","q"]``false`.
5. **Array receiver, numeric elements coerced**   `{"a":[1,2,3]}` with `$.a has [1,2]``true`.
6. **Object receiver, key existence**   `{"a":{"x":1,"y":2}}` with `$.a has ["x","y"]``true`,
   `$.a has ["x","z"]``false`.
7. **Empty array**   any receiver with `has []``true` (vacuous truth).
8. **Scalar RHS untouched**   `$.a has "x"` still lowers to `includes` and behaves as today.
9. **The original bug reproduction**   `$..books!.find_all(@.attribute_code has ["a","b"])` over a fixture where
   `attribute_code` is a string equal to `"abc"`:
   - Old behaviour (regression guard if we keep it): all books returned.
   - New behaviour: only books whose `attribute_code` contains both `a` and `b`.
   Write the test against the new behaviour; do not preserve the bug.
10. **Non-literal array RHS is rejected at parse time**    `@.field has [@.x, @.y]` produces a parse error pointing at the array.
    Confirms strategy B's parser guard (Section 3).

Also rerun `cargo test --lib parse::` and `cargo test --lib builtins::` to
catch incidental breakage.

## 6. Documentation

Update:

- `jetro-core/README.md` (or whichever doc enumerates operators) — note that
  `has` accepts an array literal on the right and document semantics per
  receiver type.
- `CLAUDE.md` "v2 Tier 1" section under `has`/membership — one-line note.
- `CHANGELOG.md` for the next release — under "Language": "`has` now accepts
  an array literal RHS, lowering to `has_all`."

## 7. Non-goals

- No new `in` operator. Grammar comment at line 162 already declares `has`
  the membership keyword; adding a second spelling is bikeshed bait.
- No runtime polymorphism inside `includes` for `Val::Arr` arguments. Keep
  `includes` strictly single-needle so its semantics are predictable.
- No "any" variant via punctuation. If users want OR semantics they write
  `has "a" || has "b"` or call `contains_any` directly. A future
  `has_any([...])` sugar (e.g. `has ?[a, b]`) is a separate proposal.
- No change to the `contains_all` / `contains_any` builtins themselves; they
  remain string-only barrier ops.

## 8. Risk and rollout

- **Risk of regression**: low. Scalar-RHS `has` is the dominant case and is
  untouched. The change is gated on `Expr::Array` at parse time.
- **Backwards compatibility**: anyone relying on the current
  always-true-on-string behaviour (which is a bug) breaks. That's intended.
  Call it out in the CHANGELOG.
- **Rollout**: single PR. Parser change, builtin registration, runtime
  helper, tests, docs. No flag.

## 9. Estimated size

Around 200 LOC plus tests: parser (~30), builtin def + registration (~40),
runtime helper (~25), tests (~80), docs (~25). One focused PR.