chadmespath 0.3.0

JMESPath for JSON — a fork of jmespath 0.3 (MIT) adding Expression::search_cached, which evaluates against a pre-converted Rcvar so a document is converted once, not per query. Used by chadselect.
Documentation

chadmespath

A minimal fork of jmespath 0.3.0 (the Rust JMESPath implementation by Michael Dowling, MIT) that adds one method — Expression::search_cached — so a JSON document can be converted into JMESPath's value tree once and queried many times, instead of being re-converted on every query.

Everything else is upstream jmespath 0.3, unchanged.

Why this fork exists

jmespath::Expression::search is generic over its input:

pub fn search<T: ToJmespath>(&self, data: T) -> SearchResult {
    let mut ctx = Context::new(&self.expression, self.runtime);
    interpret(&data.to_jmespath()?, &self.ast, &mut ctx)   // <-- to_jmespath() runs every call
}

data.to_jmespath() runs on every call. When you pass a serde_json::Value (the common case), that dispatches — on stable Rust — to the blanket impl<T: serde::Serialize> ToJmespath, which calls Variable::from_serializable: a full serde walk that allocates a brand-new Rc<Variable> tree (one Rc, one String, and a BTreeMap node per value) for the whole document, every time.

So a program that runs N JMESPath queries against one document pays that document-sized conversion N times. In a real workload (a scraper extracting a few dozen fields from each page's embedded JSON) this was measured as ~64% of all allocations for a single page — far more than the actual query work.

Why you can't fix it from the outside

The obvious fix — convert the document to an Rc<Variable> once and reuse it — is not reachable through upstream's public API on stable Rust:

  • The zero-cost identity impl ToJmespath for Rcvar (which would let search() accept an already-converted tree for free) is gated behind jmespath's specialized feature, which requires #![feature(specialization)]nightly only. On stable, even passing an Rcvar to search() re-serializes the whole tree.
  • The lower-level interpret(&Rcvar, &Ast, &mut Context) that would let you evaluate against a cached tree lives in a private module (mod interpreter) and is not re-exported.

So the cache has nowhere to plug in. This fork adds that plug.

What it adds

impl<'a> Expression<'a> {
    /// Evaluate against an already-converted `Rcvar` instead of re-converting
    /// the input on every call.
    pub fn search_cached(&self, data: &Rcvar) -> SearchResult {
        let mut ctx = Context::new(&self.expression, self.runtime);
        interpret(data, &self.ast, &mut ctx)
    }
}

Usage — convert once, query many times:

use chadmespath as jmespath;
use jmespath::{compile, Rcvar, Variable};

// Convert the document to the JMESPath tree exactly once and keep it.
let data: Rcvar = Rcvar::new(Variable::from_json(json_text).unwrap());

for expr_str in selectors {
    let expr = compile(expr_str).unwrap();
    let result = expr.search_cached(&data).unwrap();   // no document-sized allocation
    // ...
}

search_cached allocates nothing document-sized: the Rcvar is shared by reference and only the query's own results are built.

Measured effect

Running a 24-selector extraction against one document, upstream search() vs. search_cached() with a cached tree:

document size allocations bytes CPU / pass
50 specs 17,902 → 550 1,057 KiB → 21 KiB 1,377 µs → 40 µs
200 specs 65,612 → 1,460 3,930 KiB → 64 KiB 6,733 µs → 88 µs
400 specs 129,217 → 2,665 7,766 KiB → 121 KiB 12,888 µs → 168 µs

Allocations drop ~98% and stop scaling with document size; the small remainder is just building each query's result strings. CPU per pass improves 35–77×.

Relationship to upstream

  • Library-only. Upstream's build.rs (which generated the compliance tests and benches from the JMESPath compliance suite) and the tests/ and benches/ directories are removed. The library source is otherwise upstream 0.3.0.
  • Additive. search_cached is the only API change; search, compile, Variable, Runtime, Context, and everything else behave exactly as in jmespath 0.3.0.
  • Drop-in. Import it under the jmespath alias (jmespath = { package = "chadmespath", version = "0.3" }) and existing jmespath 0.3 code compiles unchanged; reach for search_cached where you have a cached Rcvar.

License

MIT, preserving the original jmespath.rs copyright. See LICENSE.