drivel
drivel is a command-line tool written in Rust for inferring a schema from an example JSON (or JSON lines) file, and generating synthetic data (the drivel in question)
based on the inferred schema.
Features
- Schema Inference: drivel can analyze JSON input and infer its schema, including data types, array lengths, and object structures.
- Data Generation: Based on the inferred schema, drivel can generate synthetic data that adheres to the inferred structure.
- Easy to integrate: drivel reads JSON input from stdin and writes its output to stdout, allowing for easy integration into pipelines and workflows.
Installation
To install the drivel executable, ensure you have the Rust toolchain installed and run:
To add drivel as a dependency to your project, e.g., to use the schema inference engine, run:
Usage
Infer a schema from JSON input, and generate synthetic data based on the inferred schema.
Usage: drivel <COMMAND>
Commands:
describe Describe the inferred schema for the input data
produce Produce synthetic data adhering to the inferred schema
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print help
-V, --version Print version
Describe
In 'describe' mode, drivel infers the schema from the input JSON and prints a human-readable description of the schema. This mode is useful for understanding the structure and data types present in JSON data.
|
Produce
In 'produce' mode, drivel infers the schema from the input JSON, generates synthetic data based on the inferred schema, and outputs the generated data in JSON format. This is useful for generating test data or sample datasets.
At present, drivel is moderately smart about schema inference. It makes some attempt at inferring semantic meaning for fields, particularly for String types, but it is fairly limited at doing this. I would welcome any work to improve drivel's ability to understand the data it sees, to have more semantic understanding baked into the generated schema.
|
You can also specify the number of times to repeat the generated data:
|
Examples
Consider a JSON file input.json:
Running drivel in 'describe' mode:
|
Output:
{
"age": int (30),
"address": {
"city": string (8),
"zip_code": string (5)
},
"is_student": boolean,
"grades": [
int (78-90)
] (3),
"name": string (8),
"id": string (uuid)
}
Running drivel in 'produce' mode:
|
Output:
Contributing
We welcome contributions from anyone interested in improving or extending drivel! Whether you have ideas for new features, bug fixes, or improvements to the documentation, feel free to open an issue or submit a pull request.
License
This project is licensed under the MIT License - see the LICENSE file for details.