jetro-experimental
Structural-index substrate for fast JSON queries. Standalone stage-1
scanner + Mison-style key-bitmap layer. Built to slot under
jetro for $..find(k == lit) shapes
without materialising the value tree.
What it does
Given a JSON byte buffer, build:
- Stage-1 columns from simd-json byte offset, kind, depth for every structural
character (
{,},[,],",:,,, scalar starts). AVX2 fast path; scalar/memchrfallback elsewhere. - Sidecar —
parent[],close_of[]for tree navigation. - Mison key bitmaps interned key dictionary with a Roaring bitmap per key over token positions. Compound predicates via bitmap-AND.
Built once, reusable. Queries like $..find(x == "test") use the
bitmap to skip every JSON object that doesn't contain key x, and a
SIMD byte compare to validate the value. No Val tree allocation.
Why
Mison (Li et al., VLDB 2017) — speculative key-bitmap indices outperform tree walks on selective queries by 10-100×. jetro-experimental is a self-contained Mison-style substrate that builds the structural columns and key bitmaps directly from raw JSON in a single pass.
Architecture
┌──────────────────────────────┐
│ JSON bytes (&[u8]) │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────────────────┐
│ stage-1 SIMD/scalar scan │
│ • emit offset/kind/depth columns │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────┐
│ index build (one fused walk) │
│ • parent / close_of │
│ • role classification │
│ • key interner + Roaring bitmaps │
└──────────────────────────────────────┘
│
▼
StructuralIndex + KeyBitmaps
│
▼
┌─────────┬────────┬──────────┬─────────┐
│ find_eq │ count │ ancestors│ slice() │
│ (Mison)│ (1) │ (parents)│ raw │
└─────────┴────────┴──────────┴─────────┘
Quick start
use ;
let bytes = br#"{"a":{"x":"test"},"b":{"x":"nope"},"c":{"x":"test"}}"#;
let idx = from_bytes.unwrap;
// Mison-style: every object whose `x` key equals "test".
for obj_tok in find_eq
// → {"x":"test"}
// → {"x":"test"}
// Popcount only — no value walk.
let n = count_key;
assert_eq!;
Public API
Entry points
;
;
Per-token
Byte-position lookup
;
;
Mison key layer
;
;
+ '_;
;
KeyHits — lazy composable
Fused query primitives
;
;
;
SIMD helpers
; // memchr escape probe
; // numeric semantics
; // fast-float when "fast-numbers"
;
Feature flags
| feature | adds |
|---|---|
fast-numbers |
fast-float SIMD f64 parser (~3× over str::parse) |
multi-key |
Aho-Corasick + Teddy multi-pattern matcher for raw byte scans |
validate-utf8 |
simdutf8 (~5-10 GB/s) |
Default: minimal. No external SIMD-JSON dependency — stage-1 of simd-json is fully self-contained.
Status
Pre-1.0. Public API is stable (StructuralIndex, TokenId, fused
helpers). Internals (Stage1, KeyBitmaps, StructIndex) are
#[doc(hidden)] and may change without semver bumps.
Roadmap
- Stage-1 SIMD scanner (AVX2 + scalar/memchr fallback)
- StructuralIndex + Mison KeyBitmaps
- Public API facade with opaque types
- SIMD helpers:
json_string_eq,parse_f64, multi-key, utf8 validation - NEON stage-1 port (currently scalar+memchr fallback on aarch64)
- Streaming
IndexBuilderfor NDJSON / huge inputs
License
This repository contain direct copy of Stage 1 from simd-json project.
MIT/Apache-2.0.