1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
/*!
A query language for JSON data that searches for matching **regular** paths in
the JSON tree, using a derivation of [regular expressions].
[regular expressions]: https://en.wikipedia.org/wiki/Regular_expression
# Quick Start
```
let input = r#"{"users": [{"name": "Alice"}, {"name": "Bob"}]}"#;
let json: jsongrep::Value = serde_json::from_str(input).unwrap();
let results = jsongrep::grep(&json, "users[*].name").unwrap();
assert_eq!(results.len(), 2);
assert_eq!(results[0].value.to_string(), r#""Alice""#);
assert_eq!(results[1].value.to_string(), r#""Bob""#);
```
For more examples, see the [`examples`](https://github.com/micahkepe/jsongrep/tree/main/examples)
directory in the repository.
# Overview
The engine is implemented as a [deterministic finite automaton (DFA)]. The DFA
is constructed from a query AST, which is a tree-like structure that represents
the query. The DFA is then used to search for matches in the input JSON data.
[deterministic finite automaton (DFA)]: https://en.wikipedia.org/wiki/Deterministic_finite_automaton
A JSON data structure is represented as a tree, where each node is a JSON value
(string, number, boolean, null, or object/array) and each edge is either a field
name or an index. For example, let's consider the following JSON data:
```json
{
"name": "John Doe",
"age": 30,
"foo": [1, 2, 3]
}
```
The corresponding tree structure would be the root node, with three edges:
`"name"`, `"age"`, and `"foo"`. The `"name"` edge would point to the string
`"John Doe"` and the `"age"` edge would point to the number `30`. The `"foo"`
edge would point to a node with three edges of the array access `[0]`, `[1]`,
and `[2]`, which point to the numbers `1`, `2`, and `3`, respectively.
To query the JSON document, the query and document are both parsed into intermediary
ASTs. The query AST is then used to construct first a non-deterministic finite
automaton (NFA) which is then determinized into a deterministic finite automaton
(DFA) that can be directly simulated against the input JSON document.
For more details on the automaton constructions, see the [`dfa`] and
[`nfa`] modules of the [`query`] module.
# Query Language
The query language relies on regular expression syntax, with some modifications
to support JSON.
| Operator | Example | Description |
| ------------ | -------------------- | ------------------------------------------------------------- |
| Sequence | `foo.bar.baz` | **Concatenation**: match path `foo` → `bar` → `baz` |
| Disjunction | `foo \| bar` | **Union**: match either `foo` or `bar` |
| Kleene star | `**` | Match zero or more field accesses |
| Repetition | `foo*` | Repeat the preceding step zero or more times |
| Wildcards | `*` or `[*]` | Match any single field or array index |
| Optional | `foo?.bar` | Optional `foo` field access |
| Field access | `foo` or `"foo bar"` | Match a specific field (quote if spaces) |
| Array index | `[0]` or `[1:3]` | Match specific index or slice (exclusive end) |
These queries can be arbitrarily nested with parentheses. For example,
`foo.(bar|baz).qux` matches `foo.bar.qux` or `foo.baz.qux`.
This also means that you can recursively descend **any** path with `(* | [*])*`,
e.g., `(* | [*])*.foo` to find all paths matching `foo` field at any
depth.
Here are some example queries and their meanings:
- `name`: Matches the `name` field in the root object (e.g., ```"John Doe"```).
- `address.street`: Matches the `street` field inside the `address` object.
- `address.*`: Matches any field in the `address` object (e.g., `street`, `city`, etc.).
- `address.[*]`: Matches all elements in an array if `address` were an array.
- `(name|age)`: Matches either the `name` or `age` field in the root object.
- `address.([*] | *)*`: Matches any value at any depth under `address`.
We can also use ranges to match specific indices in arrays:
- `foo.[2:4]`: Matches elements at indices 2 and 3 in the `foo` array.
- `foo.[2:]`: Matches all elements in the `foo` array from index 2 onward.
Finally, we can use wildcards to match any field or index:
- `*`: Matches any single field in the root object.
- `[*]`: Matches any single array index in the root array.
- `[*].*`: Matches any field inside each element of an array.
- `([*] | *)*`: Matches any field or index at any level of the JSON tree.
## Playground
You can try queries interactively in the [playground](https://micahkepe.com/jsongrep/playground/).
[`nfa`]: crate::query::nfa
[`dfa`]: crate::query::dfa
[`query`]: crate::query
*/
/// Re-export [`serde_json_borrow::Value`] so downstream users don't need to
/// depend on `serde_json_borrow` directly.
pub use Value;
/// Query a JSON document with a query string, returning all matches.
///
/// This is the simplest entry point for the library. For repeated queries
/// against different documents, prefer compiling the query once with
/// [`query::QueryDFA::from_query_str`] and calling [`query::QueryDFA::find`].
///
/// # Errors
///
/// Returns an error if the query string is invalid.
///
/// # Examples
///
/// ```
/// let json: jsongrep::Value = serde_json::from_str(r#"{"a": 1, "b": 2}"#).unwrap();
/// let results = jsongrep::grep(&json, "a").unwrap();
/// assert_eq!(results.len(), 1);
/// assert_eq!(results[0].value.to_string(), "1");
/// ```