Skip to main content

Module scan

Module scan 

Source
Expand description

SIMD byte-scan over raw JSON bytes.

Locates "key": occurrences in a JSON document without first parsing the document into a tree. memchr (AVX2-internal when available) jumps byte-by-byte to the next structurally relevant character — a " outside strings, a " or \\ inside strings — so the scanner traverses the document at near-memory-bandwidth speed.

§When to use

For $..key (all descendants by name) or $..find(@.key op lit) shapes over a large JSON document where the caller retained the raw bytes (see Jetro::from_bytes). Skips the tree walk entirely — scan cost is bounded by byte length, not node count.

§When not to use

  • Document already parsed; raw bytes discarded — fall back to the tree walker in eval/mod.rs::collect_desc.
  • Document is tiny (< a few KB): serde_json per-hit parse cost overtakes the scan win.

§Correctness

The scanner respects JSON string-literal escape rules: an unescaped " toggles in_string, and a \\ inside a string skips the next byte. A needle match must begin at a " encountered while in_string is false — exactly where a JSON object-key literal can legally appear. Hits inside string values ("comment":"the \"test\" case") are therefore rejected.

Structs§

NumFold
Fold numeric values over spans into (int_sum, float_sum, is_float, n). Integer spans accumulate into int_sum; a single float promotes the whole fold to float_sum (which tracks the running total as f64). Spans that don’t parse as numbers are skipped.
ValueSpan
Span of a single JSON value in bytes: start offset inclusive, end offset exclusive. Produced by find_key_value_spans; the caller may compare raw bytes against a literal without allocating a Value.

Enums§

ScanCmp
Comparison operator for numeric-range byte scans. Mirrors the subset of ast::BinOp that makes sense against a canonical JSON number literal.
ScanPred
A single predicate against the value paired with a key inside an enclosing object. Drives find_enclosing_objects_mixed.

Functions§

count_key_value_eq
Extract every value for key whose raw bytes equal lit after trimming leading whitespace. lit is expected to be pre-serialised JSON (e.g. br#""action""#, b"42"). Bytewise comparison is safe for JSON primitives with canonical serialisation; it is not correct for objects/arrays where key order or whitespace may differ.
extract_values
Extract every value paired with key at any depth. Uses find_key_positions to locate each "key": site and then parses the single value that follows via a streaming serde_json::Deserializer (stops at the end of the first value — not the whole document).
extract_values_eq
Extract the parsed Value for every key site whose raw bytes equal lit. Matches by bytewise equality on the span — safe for JSON primitives (strings, numbers, bools, null) which serialise canonically, not for objects/arrays. Non-matching sites are skipped without paying the serde_json parse cost.
find_direct_field
Extract the span of the direct child named key inside an object whose bytes span is obj_bytes[0] == b'{'. Depth-aware: matches only keys at the top level of the object, not keys nested inside arrays or sub-objects. Returned span is relative to obj_bytes.
find_enclosing_objects_cmp
Locate the byte span of every enclosing object whose key field is a JSON number satisfying op threshold. Powers the fast path for $..find(@.key op num) where op<, <=, >, >=.
find_enclosing_objects_eq
Locate the byte span of every enclosing object whose key field equals the canonical-serialised literal lit. Powers the SIMD fast path for $..find(@.key == lit).
find_enclosing_objects_eq_multi
Like find_enclosing_objects_eq but accepts N (key, lit) conjuncts. An object is emitted iff it directly contains every listed key with the matching canonical literal value. Each frame carries a bitmask of which conjuncts have matched so far (max 64 conjuncts).
find_enclosing_objects_mixed
Mixed multi-conjunct scan: each conjunct is (key, ScanPred) and an enclosing object is emitted iff every conjunct matches on the same {...} frame. Generalises find_enclosing_objects_eq_multi to allow equality literals and numeric-range comparisons in the same query. Frames carry a bitmask of satisfied conjuncts (max 64).
find_first_key_value_span
Early-exit variant of find_key_value_spans — returns the first span paired with key encountered in document order, or None if the key does not appear. Powers the Descendant(k) + .first() fast path: walks only as far as needed to find one match, rather than scanning the entire byte buffer.
find_key_positions
Scan raw JSON bytes for every "key": occurrence that starts at a structural position (i.e. not inside a string literal). Returns the byte offset of each matching opening " in document order.
find_key_value_spans
Locate the byte span of every value paired with key. Skips whitespace between : and the value and then walks the value to its end — strings obey escape rules, containers track nesting depth, scalars run until the next structural terminator.
fold_direct_field_nums
Fold the direct child named key of each enclosing object span into a single NumFold. Combines find_direct_field + parse_num_span without materialising any intermediate Val. Spans missing the key or whose value is non-numeric are skipped.
fold_nums
Fold numeric spans for sum/avg/min/max. Walks each span, parses as number, updates the accumulators. Non-numeric spans are skipped.
parse_num_span
Parse a span of JSON numeric bytes. Returns Some((as_i64, as_f64, is_int)) or None if not a valid number. Canonical JSON numbers only: -?\d+(\.\d+)?(e±\d+)?.