jsonbb
jsonbb is a binary representation of JSON value. It is inspired by JSONB in PostgreSQL and optimized for fast parsing.
Usage
jsonbb provides an API similar to serde_json for constructing and querying JSON values.
// Deserialize a JSON value from a string of JSON text.
let value: Value = r#"{"name": ["foo", "bar"]}"#.parse.unwrap;
// Serialize a JSON value into JSON text.
let json = value.to_string;
assert_eq!;
As a binary format, you can extract byte slices from it or read JSON values from byte slices.
// Get the underlying byte slice of a JSON value.
let jsonbb = value.as_bytes;
// Read a JSON value from a byte slice.
let value = from_bytes;
You can use common API to query JSON and then build new JSON values using the Builder API.
// Indexing
let name = value.get.unwrap;
let foo = name.get.unwrap;
assert_eq!;
// Build a JSON value.
let mut builder = new;
builder.begin_object;
builder.add_string;
builder.add_value;
builder.end_object;
let value = builder.finish;
assert_eq!;
Format
jsonbb stores JSON values in contiguous memory. By avoiding dynamic memory allocation, it is more cache-friendly and provides efficient parsing and querying performance.
It has the following key features:
- Memory Continuity: The content of any JSON subtree is stored contiguously, allowing for efficient copying through
memcpy. This leads to highly efficient indexing operations. - Post-Order Traversal: JSON nodes are stored in post-order traversal sequence. When parsing JSON strings, output can be sequentially written to the buffer without additional memory allocation and movement. This results in highly efficient parsing operations.
Performance Comparison
| item[^0] | jsonbb | jsonb | serde_json | simd_json |
|---|---|---|---|---|
canada.parse() |
4.7394 ms | 12.640 ms | 10.806 ms | 6.0767 ms [^1] |
canada.to_json() |
5.7694 ms | 20.420 ms | 5.5702 ms | 3.0548 ms |
canada.size() |
2,117,412 B | 1,892,844 B | ||
canada["type"][^2] |
39.181 ns[^2.1] | 316.51 ns[^2.2] | 67.202 ns [^2.3] | 27.102 ns [^2.4] |
citm_catalog["areaNames"] |
92.363 ns | 328.70 ns | 2.1190 µs [^3] | 1.9012 µs [^3] |
from("1234567890") |
26.840 ns | 91.037 ns | 45.130 ns | 21.513 ns |
a == b |
66.513 ns | 115.89 ns | 39.213 ns | 41.675 ns |
a < b |
71.793 ns | 120.77 ns | not supported | not supported |
[^0]: JSON files for benchmark: canada, citm_catalog
[^1]: Parsed to simd_json::OwnedValue for fair.
[^2]: canada["type"] returns a string, so the primary overhead of this operation lies in indexing.
[^2.1]: jsonbb uses binary search on sorted keys
[^2.2]: jsonb uses linear search on unsorted keys
[^2.3]: serde_json uses BTreeMap
[^2.4]: simd_json uses HashMap
[^3]: citm_catalog["areaNames"] returns an object with 17 key-value string pairs. However, both serde_json and simd_json exhibit slower performance due to dynamic memory allocation for each string. In contrast, jsonb employs a flat representation, allowing for direct memcpy operations, resulting in better performance.