chadmespath
A minimal fork of jmespath 0.3.0 (the Rust
JMESPath implementation by Michael Dowling, MIT) that adds
one method — Expression::search_cached — so a JSON document can be converted
into JMESPath's value tree once and queried many times, instead of being
re-converted on every query.
Everything else is upstream jmespath 0.3, unchanged.
Why this fork exists
jmespath::Expression::search is generic over its input:
data.to_jmespath() runs on every call. When you pass a serde_json::Value
(the common case), that dispatches — on stable Rust — to the blanket
impl<T: serde::Serialize> ToJmespath, which calls Variable::from_serializable:
a full serde walk that allocates a brand-new Rc<Variable> tree (one Rc, one
String, and a BTreeMap node per value) for the whole document, every time.
So a program that runs N JMESPath queries against one document pays that document-sized conversion N times. In a real workload (a scraper extracting a few dozen fields from each page's embedded JSON) this was measured as ~64% of all allocations for a single page — far more than the actual query work.
Why you can't fix it from the outside
The obvious fix — convert the document to an Rc<Variable> once and reuse it — is
not reachable through upstream's public API on stable Rust:
- The zero-cost identity
impl ToJmespath for Rcvar(which would letsearch()accept an already-converted tree for free) is gated behind jmespath'sspecializedfeature, which requires#![feature(specialization)]— nightly only. On stable, even passing anRcvartosearch()re-serializes the whole tree. - The lower-level
interpret(&Rcvar, &Ast, &mut Context)that would let you evaluate against a cached tree lives in a private module (mod interpreter) and is not re-exported.
So the cache has nowhere to plug in. This fork adds that plug.
What it adds
Usage — convert once, query many times:
use chadmespath as jmespath;
use ;
// Convert the document to the JMESPath tree exactly once and keep it.
let data: Rcvar = new;
for expr_str in selectors
search_cached allocates nothing document-sized: the Rcvar is shared by
reference and only the query's own results are built.
Measured effect
Running a 24-selector extraction against one document, upstream search() vs.
search_cached() with a cached tree:
| document size | allocations | bytes | CPU / pass |
|---|---|---|---|
| 50 specs | 17,902 → 550 | 1,057 KiB → 21 KiB | 1,377 µs → 40 µs |
| 200 specs | 65,612 → 1,460 | 3,930 KiB → 64 KiB | 6,733 µs → 88 µs |
| 400 specs | 129,217 → 2,665 | 7,766 KiB → 121 KiB | 12,888 µs → 168 µs |
Allocations drop ~98% and stop scaling with document size; the small remainder is just building each query's result strings. CPU per pass improves 35–77×.
Relationship to upstream
- Library-only. Upstream's
build.rs(which generated the compliance tests and benches from the JMESPath compliance suite) and thetests/andbenches/directories are removed. The library source is otherwise upstream 0.3.0. - Additive.
search_cachedis the only API change;search,compile,Variable,Runtime,Context, and everything else behave exactly as in jmespath 0.3.0. - Drop-in. Import it under the
jmespathalias (jmespath = { package = "chadmespath", version = "0.3" }) and existing jmespath 0.3 code compiles unchanged; reach forsearch_cachedwhere you have a cachedRcvar.
License
MIT, preserving the original jmespath.rs copyright. See LICENSE.