Skip to main content

jsongrep/
lib.rs

1#![allow(rustdoc::private_intra_doc_links)]
2/*!
3This crate provides a query language for JSON data that can be used to search
4for matching **regular** paths in the JSON tree, using a derivation of [regular
5expressions].
6
7[regular expressions]: https://en.wikipedia.org/wiki/Regular_expression
8
9# Overview
10
11The engine is implemented as a [deterministic finite automaton (DFA)]. The DFA
12is constructed from a query AST, which is a tree-like structure that represents
13the query. The DFA is then used to search for matches in the input JSON data.
14
15[deterministic finite automaton (DFA)]: https://en.wikipedia.org/wiki/Deterministic_finite_automaton
16
17A JSON data structure is represented as a tree, where each node is a JSON value
18(string, number, boolean, null, or object/array) and each edge is either a field
19name or an index. For example, let's consider the following JSON data:
20
21```json
22{
23    "name": "John Doe",
24    "age": 30,
25    "foo": [1, 2, 3]
26}
27```
28
29The corresponding tree structure would be the root node, with three edges:
30`"name"`, `"age"`, and `"foo"`. The `"name"` edge would point to the string
31`"John Doe"` and the `"age"` edge would point to the number `30`. The `"foo"`
32edge would point to a node with three edges of the array access `[0]`, `[1]`,
33and `[2]`, which point to the numbers `1`, `2`, and `3`, respectively.
34
35To query the JSON document, the query and document are both parsed into intermediary
36ASTs. The query AST is then used to construct first a non-deterministic finite
37automaton (NFA) which is then determinized into a deterministic finite automaton
38(DFA) that can be directly simulated against the input JSON document.
39
40For more details on the automaton constructions, see the [`dfa`] and
41[`nfa`] modules of the [`query`] module.
42
43# Query Language
44
45The query language relies on regular expression syntax, with some modifications
46to support JSON.
47
48## Grammar
49
50The grammar for the query language is defined in the `query.pest` file in the
51`grammar` directory.
52
53# Examples
54
55Here are some example queries and their meanings:
56
57- `name`: Matches the `name` field in the root object (e.g., ```"John Doe"```).
58- `address.street`: Matches the `street` field inside the `address` object.
59- `address.*`: Matches any field in the `address` object (e.g., `street`, `city`, etc.).
60- `address.[*]`: Matches all elements in an array if `address` were an array.
61- `(name|age)`: Matches either the `name` or `age` field in the root object.
62- `address.([*] | *)*`: Matches any value at any depth under `address`.
63
64We can also use ranges to match specific indices in arrays:
65
66- `foo.[2:4]`: Matches elements at indices 2 and 3 in the `foo` array.
67- `foo.[2:]`: Matches all elements in the `foo` array from index 2 onward.
68
69Finally, we can use wildcards to match any field or index:
70
71- `*`: Matches any single field in the root object.
72- `[*]`: Matches any single array index in the root array.
73- `[*].*`: Matches any field inside each element of an array.
74- `([*] | *)*`: Matches any field or index at any level of the JSON tree.
75
76[`nfa`]: crate::query::nfa
77[`dfa`]: crate::query::dfa
78[`query`]: crate::query
79*/
80pub mod commands;
81pub mod query;
82pub mod utils;