Skip to main content

Crate jsongrep

Crate jsongrep 

Source
Expand description

This crate provides a query language for JSON data that can be used to search for matching regular paths in the JSON tree, using a derivation of regular expressions.

§Overview

The engine is implemented as a deterministic finite automaton (DFA). The DFA is constructed from a query AST, which is a tree-like structure that represents the query. The DFA is then used to search for matches in the input JSON data.

A JSON data structure is represented as a tree, where each node is a JSON value (string, number, boolean, null, or object/array) and each edge is either a field name or an index. For example, let’s consider the following JSON data:

{
    "name": "John Doe",
    "age": 30,
    "foo": [1, 2, 3]
}

The corresponding tree structure would be the root node, with three edges: "name", "age", and "foo". The "name" edge would point to the string "John Doe" and the "age" edge would point to the number 30. The "foo" edge would point to a node with three edges of the array access [0], [1], and [2], which point to the numbers 1, 2, and 3, respectively.

To query the JSON document, the query and document are both parsed into intermediary ASTs. The query AST is then used to construct first a non-deterministic finite automaton (NFA) which is then determinized into a deterministic finite automaton (DFA) that can be directly simulated against the input JSON document.

For more details on the automaton constructions, see the dfa and nfa modules of the query module.

§Query Language

The query language relies on regular expression syntax, with some modifications to support JSON.

§Grammar

The grammar for the query language is defined in the query.pest file in the grammar directory.

§Examples

Here are some example queries and their meanings:

  • name: Matches the name field in the root object (e.g., "John Doe").
  • address.street: Matches the street field inside the address object.
  • address.*: Matches any field in the address object (e.g., street, city, etc.).
  • address.[*]: Matches all elements in an array if address were an array.
  • (name|age): Matches either the name or age field in the root object.
  • address.([*] | *)*: Matches any value at any depth under address.

We can also use ranges to match specific indices in arrays:

  • foo.[2:4]: Matches elements at indices 2 and 3 in the foo array.
  • foo.[2:]: Matches all elements in the foo array from index 2 onward.

Finally, we can use wildcards to match any field or index:

  • *: Matches any single field in the root object.
  • [*]: Matches any single array index in the root array.
  • [*].*: Matches any field inside each element of an array.
  • ([*] | *)*: Matches any field or index at any level of the JSON tree.

Modules§

commands
Available subcommands for jsongrep binary.
query
This module provides the main query engine implementation, as well as the parser for the query language and the intermediary AST representations of queries.
utils
Miscellaneous utility functions.

Macros§

field
Constructs a query that matches a specific field name.