Crate json_five

Source
Expand description

§json-five-rs

This project provides a handwritten JSON5 tokenizer and recursive descent parser compatible with serde.

Crates.io Version docs.rs

§Key Features

  • Compatible with serde data model
  • Supports round-trip use cases with preservation/editing of whitespace and comments
  • Supports formatting (indent, compact formats, etc.) in serialization
  • Supports both model-based (AST) edits and token-based round-tripping
  • Performance-focused default tokenizer/parser that avoids copying input
  • Ergonomics-focused round-trip tokenizer/parser that produce structures with solely owned types for ease of editing
  • Supports basic parsing and serialization without serde (you may disable the default serde feature!)

§Usage

You can use this lib with serde in the typical way:

use json_five::from_str;
use serde::Deserialize;
#[derive(Debug, PartialEq, Deserialize)]
struct MyData {
    name: String,
    count: i64,
    maybe: Option<f64>,
}

fn main() {
    let source = r#"
    // A possible JSON5 input
    {
      name: 'Hello',
      count: 42,
      maybe: null
    }
"#;

    let parsed = from_str::<MyData>(source).unwrap();
    let expected = MyData {name: "Hello".to_string(), count: 42, maybe: None};
    assert_eq!(parsed, expected);
}

Serializing also works in the usual way. The re-exported to_string function comes from the ser module and works how you’d expect with default formatting.

use serde::Serialize;
use json_five::to_string;
#[derive(Serialize)]
struct Test {
    int: u32,
    seq: Vec<&'static str>,
}
let test = Test {
    int: 1,
    seq: vec!["a", "b"],
};

let serialized = to_string(&test).unwrap();

let expected = r#"{"int": 1, "seq": ["a", "b"]}"#;
assert_eq!(serialized, expected);

You may also use the to_string_formatted with a FormatConfiguration to control the output format, including indentation, trailing commas, and key/item separators. A few useful constructors are available, including ::compact() for the most compact format (no whitespace).

use serde::Serialize;
use json_five::{to_string_formatted, FormatConfiguration, TrailingComma};
#[derive(Serialize)]
struct Test {
    int: u32,
    seq: Vec<&'static str>,
}
let test = Test {
    int: 1,
    seq: vec!["a", "b"],
};

let config = FormatConfiguration::with_indent(4, TrailingComma::ALL);
let formatted_doc = to_string_formatted(&test, config).unwrap();

let expected = r#"{
    "int": 1,
    "seq": [
        "a",
        "b",
    ],
}"#;

assert_eq!(formatted_doc, expected);

Tip: you may use serde_json::Value as a target type for deserializing arbitrary JSON5 documents. In the future this crate may provide an equivalent type directly (or maybe vendor this).

§Examples

See the examples/ directory for examples of programs that utilize round-tripping features.

  • examples/json5-doublequote-fixer gives an example of tokenization-based round-tripping edits
  • examples/json5-trailing-comma-formatter gives an example of model-based round-tripping edits

§Benchmarking

Benchmarks are available in the benches/ directory. Test data is in the data/ directory. A couple of benchmarks use big files that are not committed to this repo. So run ./data/setupdata.sh to download the required data files so that you don’t skip the big benchmarks. The benchmarks compare json_five (this crate) to serde_json and json5-rs.

Notwithstanding the general caveats of benchmarks, in initial testing, json_five definitively outperforms json5-rs. In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests. At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve, if at least marginally, in the future.

These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won’t be updated unless significant changes happen.

testjson_fivejson5serde_json
big (25MB)580.31 ms3.0861 s150.39 ms
medium-ascii (5MB)199.88 ms706.94 ms59.008 ms
empty228.62 ns708.00 ns38.786 ns
arrays578.24 ns1.3228 µs100.95 ns
objects922.91 ns2.0748 µs205.75 ns
nested-array22.990 µs29.356 µs5.0483 µs
nested-objects50.659 µs132.75 µs14.755 µs
string421.17 ns3.5691 µs91.051 ns
number238.75 ns779.13 ns36.179 ns
deserialize (size 10)6.9898µs58.398µs886.33ns
deserialize (size 100)66.005µs830.79µs9.9705µs
deserialize (size 1000)599.39µs8.4952ms69.110µs
deserialize (size 10000)5.9841ms82.591ms734.40µs
deserialize (size 100000)66.841ms955.37ms11.638ms
deserialize (size 1000000)674.13ms9.5758s119.03ms
serialize (size 10)2.3496µs48.915µs891.85ns
serialize (size 100)19.602µs458.98µs6.7109µs
serialize (size 1000)194.19µs4.6035ms62.667µs
serialize (size 10000)2.2104ms47.253ms761.10µs
serialize (size 100000)24.418ms502.35ms11.410ms
serialize (size 1000000)245.26ms4.6211s115.84ms

§Notes

§Status

This project is in very early phases. While the crate is usable right now, more thorough testing is needed to ensure that the tokenizer/parser rejects invalid documents.

Questions, discussions, and contributions are welcome. Right now, things are moving fast, so the best way to contribute is likely to just open an issue.

Expect breaking changes for now, even in patch releases.

§Serde is optional

Using serde is actually optional. Some use cases may not require the use of serde’s various deserialization methods and may only need to rely on the tokenizer and/or AST tree features. By default, the serde feature is enabled, but this can be disabled. Even without the serde feature, the parser modules provide functions and methods for parsing and serialization, including the ability to customize the style.

Re-exports§

pub use de::from_str;
pub use de::from_bytes;
pub use de::JSONValueDeserializer;
pub use ser::to_string;
pub use ser::to_string_formatted;
pub use ser::Serializer;
pub use parser::from_str as model_from_str;
pub use parser::from_bytes as model_from_bytes;
pub use parser::from_tokens as model_from_tokens;
pub use parser::FormatConfiguration;
pub use parser::TrailingComma;
pub use tokenize::tokenize_bytes;
pub use tokenize::tokenize_str;
pub use tokenize::tokenize_rt_str;
pub use tokenize::tokenize_rt_bytes;
pub use rt::tokenize::tokens_to_source;
pub use rt::tokenize::source_to_tokens;

Modules§

de
The deserialization module, for serde compatibility (optional feature)
parser
The default performance-focused parser
rt
The round-tripping module
ser
The serialization module, for serde compatibility (optional feature)
tokenize
The default performance-focused tokenizer