1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/*!
This crate provides a query language for JSON data that can be used to search
for matching **regular** paths in the JSON tree, using a derivation of [regular
expressions].
[regular expressions]: https://en.wikipedia.org/wiki/Regular_expression
# Overview
The engine is implemented as a [deterministic finite automaton (DFA)]. The DFA
is constructed from a query AST, which is a tree-like structure that represents
the query. The DFA is then used to search for matches in the input JSON data.
[deterministic finite automaton (DFA)]: https://en.wikipedia.org/wiki/Deterministic_finite_automaton
A JSON data structure is represented as a tree, where each node is a JSON value
(string, number, boolean, null, or object/array) and each edge is either a field
name or an index. For example, let's consider the following JSON data:
```json
{
"name": "John Doe",
"age": 30,
"foo": [1, 2, 3]
}
```
The corresponding tree structure would be the root node, with three edges:
`"name"`, `"age"`, and `"foo"`. The `"name"` edge would point to the string
`"John Doe"` and the `"age"` edge would point to the number `30`. The `"foo"`
edge would point to a node with three edges of the array access `[0]`, `[1]`,
and `[2]`, which point to the numbers `1`, `2`, and `3`, respectively.
To query the JSON document, the query and document are both parsed into intermediary
ASTs. The query AST is then used to construct first a non-deterministic finite
automaton (NFA) which is then determinized into a deterministic finite automaton
(DFA) that can be directly simulated against the input JSON document.
For more details on the automaton constructions, see the [`dfa`] and
[`nfa`] modules of the [`query`] module.
# Query Language
The query language relies on regular expression syntax, with some modifications
to support JSON.
## Grammar
The grammar for the query language is defined in the `query.pest` file in the
`grammar` directory.
# Examples
Here are some example queries and their meanings:
- `name`: Matches the `name` field in the root object (e.g., ```"John Doe"```).
- `address.street`: Matches the `street` field inside the `address` object.
- `address.*`: Matches any field in the `address` object (e.g., `street`, `city`, etc.).
- `address.[*]`: Matches all elements in an array if `address` were an array.
- `(name|age)`: Matches either the `name` or `age` field in the root object.
- `address.([*] | *)*`: Matches any value at any depth under `address`.
We can also use ranges to match specific indices in arrays:
- `foo.[2:4]`: Matches elements at indices 2 and 3 in the `foo` array.
- `foo.[2:]`: Matches all elements in the `foo` array from index 2 onward.
Finally, we can use wildcards to match any field or index:
- `*`: Matches any single field in the root object.
- `[*]`: Matches any single array index in the root array.
- `[*].*`: Matches any field inside each element of an array.
- `([*] | *)*`: Matches any field or index at any level of the JSON tree.
[`nfa`]: crate::query::nfa
[`dfa`]: crate::query::dfa
[`query`]: crate::query
*/