Expand description
This crate provides a query language for JSON data that can be used to search for matching regular paths in the JSON tree, using a derivation of regular expressions.
§Overview
The engine is implemented as a deterministic finite automaton (DFA). The DFA is constructed from a query AST, which is a tree-like structure that represents the query. The DFA is then used to search for matches in the input JSON data.
A JSON data structure is represented as a tree, where each node is a JSON value (string, number, boolean, null, or object/array) and each edge is either a field name or an index. For example, let’s consider the following JSON data:
{
"name": "John Doe",
"age": 30,
"foo": [1, 2, 3]
}The corresponding tree structure would be the root node, with three edges:
"name", "age", and "foo". The "name" edge would point to the string
"John Doe" and the "age" edge would point to the number 30. The "foo"
edge would point to a node with three edges of the array access [0], [1],
and [2], which point to the numbers 1, 2, and 3, respectively.
To query the JSON document, the query and document are both parsed into intermediary ASTs. The query AST is then used to construct first a non-deterministic finite automaton (NFA) which is then determinized into a deterministic finite automaton (DFA) that can be directly simulated against the input JSON document.
For more details on the automaton constructions, see the dfa and
nfa modules of the query module.
§Query Language
The query language relies on regular expression syntax, with some modifications to support JSON.
§Grammar
The grammar for the query language is defined in the query.pest file in the
grammar directory.
§Examples
Here are some example queries and their meanings:
name: Matches thenamefield in the root object (e.g.,"John Doe").address.street: Matches thestreetfield inside theaddressobject.address.*: Matches any field in theaddressobject (e.g.,street,city, etc.).address.[*]: Matches all elements in an array ifaddresswere an array.(name|age): Matches either thenameoragefield in the root object.address.([*] | *)*: Matches any value at any depth underaddress.
We can also use ranges to match specific indices in arrays:
foo.[2:4]: Matches elements at indices 2 and 3 in thefooarray.foo.[2:]: Matches all elements in thefooarray from index 2 onward.
Finally, we can use wildcards to match any field or index:
*: Matches any single field in the root object.[*]: Matches any single array index in the root array.[*].*: Matches any field inside each element of an array.([*] | *)*: Matches any field or index at any level of the JSON tree.
Modules§
- commands
- Available subcommands for jsongrep binary.
- query
- This module provides the main query engine implementation, as well as the parser for the query language and the intermediary AST representations of queries.
- utils
- Miscellaneous utility functions.
Macros§
- field
- Constructs a query that matches a specific field name.