# json-five-rs
This project provides a handwritten JSON5 tokenizer and recursive descent parser compatible with `serde`.
[](https://crates.io/crates/json-five/) [](https://docs.rs/json-five/latest/json_five/)
# Key Features
- Compatible with `serde` data model
- Supports round-trip use cases with preservation/editing of whitespace and comments
- Supports formatting (indent, compact formats, etc.) in serialization
- Supports both model-based (AST) edits and token-based round-tripping
- Performance-focused default tokenizer/parser that avoids copying input
- Ergonomics-focused round-trip tokenizer/parser that produce structures with solely owned types for ease of editing
- Supports basic parsing and serialization without `serde` (you may disable the default `serde` feature!)
# Usage
You can use this lib with `serde` in the typical way:
```rust
use json_five::from_str;
use serde::Deserialize;
#[derive(Debug, PartialEq, Deserialize)]
struct MyData {
name: String,
count: i64,
maybe: Option<f64>,
}
fn main() {
let source = r#"
// A possible JSON5 input
{
name: 'Hello',
count: 42,
maybe: null
}
"#;
let parsed = from_str::<MyData>(source).unwrap();
let expected = MyData {name: "Hello".to_string(), count: 42, maybe: None};
assert_eq!(parsed, expected);
}
```
Serializing also works in the usual way. The re-exported `to_string` function comes from the `ser` module and works
how you'd expect with default formatting.
```rust
use serde::Serialize;
use json_five::to_string;
#[derive(Serialize)]
struct Test {
int: u32,
seq: Vec<&'static str>,
}
let test = Test {
int: 1,
seq: vec!["a", "b"],
};
let serialized = to_string(&test).unwrap();
let expected = r#"{"int": 1, "seq": ["a", "b"]}"#;
assert_eq!(serialized, expected);
```
You may also use the `to_string_formatted` with a `FormatConfiguration` to control the output format, including
indentation, trailing commas, and key/item separators. A few useful constructors are available, including
`::compact()` for the most compact format (no whitespace).
```rust
use serde::Serialize;
use json_five::{to_string_formatted, FormatConfiguration, TrailingComma};
#[derive(Serialize)]
struct Test {
int: u32,
seq: Vec<&'static str>,
}
let test = Test {
int: 1,
seq: vec!["a", "b"],
};
let config = FormatConfiguration::with_indent(4, TrailingComma::ALL);
let formatted_doc = to_string_formatted(&test, config).unwrap();
let expected = r#"{
"int": 1,
"seq": [
"a",
"b",
],
}"#;
assert_eq!(formatted_doc, expected);
```
Tip: you may use `serde_json::Value` as a target type for deserializing arbitrary JSON5 documents. In the future
this crate may provide an equivalent type directly (or maybe vendor this).
## Examples
See the `examples/` directory for examples of programs that utilize round-tripping features.
- `examples/json5-doublequote-fixer` gives an example of tokenization-based round-tripping edits
- `examples/json5-trailing-comma-formatter` gives an example of model-based round-tripping edits
# Benchmarking
Benchmarks are available in the `benches/` directory. Test data is in the `data/` directory. A couple of benchmarks use
big files that are not committed to this repo. So run `./data/setupdata.sh` to download the required data files
so that you don't skip the big benchmarks. The benchmarks compare `json_five` (this crate) to
[serde_json](https://github.com/serde-rs/json) and [json5-rs](https://github.com/callum-oakley/json5-rs).
Notwithstanding the general caveats of benchmarks, in initial testing, `json_five` definitively outperforms `json5-rs`.
In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests.
At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve,
if at least marginally, in the future.
These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won't be updated unless significant changes happen.
| big (25MB) | 580.31 ms | 3.0861 s | 150.39 ms |
| medium-ascii (5MB) | 199.88 ms | 706.94 ms | 59.008 ms |
| empty | 228.62 ns | 708.00 ns | 38.786 ns |
| arrays | 578.24 ns | 1.3228 µs | 100.95 ns |
| objects | 922.91 ns | 2.0748 µs | 205.75 ns |
| nested-array | 22.990 µs | 29.356 µs | 5.0483 µs |
| nested-objects | 50.659 µs | 132.75 µs | 14.755 µs |
| string | 421.17 ns | 3.5691 µs | 91.051 ns |
| number | 238.75 ns | 779.13 ns | 36.179 ns |
| deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
| deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
| deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
| deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
| deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
| deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
| serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
| serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
| serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
| serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
| serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
| serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
# Notes
## Status
This project is in **very** early phases. While the crate is usable right now, more thorough testing is needed to
ensure that the tokenizer/parser rejects invalid documents.
Questions, discussions, and contributions are welcome. Right now, things are moving fast, so the best way to contribute
is likely to just [open an issue](https://github.com/spyoungtech/json-five-rs/issues).
Expect breaking changes for now, even in patch releases.
## Serde is optional
Using `serde` is actually optional. Some use cases may not require the use of serde's various deserialization methods
and may only need to rely on the tokenizer and/or AST tree features. By default, the serde feature is enabled,
but this can be disabled. Even without the `serde` feature, the parser modules provide functions and methods for
parsing and serialization, including the ability to customize the style.