Crate jiter

Source
Expand description

§jiter

CI Crates.io CodSpeed Badge

Fast iterable JSON parser.

Documentation is available at docs.rs/jiter.

jiter has three interfaces:

  • JsonValue an enum representing JSON data
  • Jiter an iterator over JSON data
  • PythonParse which parses a JSON string into a Python object

§JsonValue Example

See the JsonValue docs for more details.

use jiter::JsonValue;

let json_data = r#"
    {
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    }"#;
let json_value = JsonValue::parse(json_data.as_bytes(), true).unwrap();
println!("{:#?}", json_value);

returns:

Object(
    {
        "name": Str("John Doe"),
        "age": Int(43),
        "phones": Array(
            [
                Str("+44 1234567"),
                Str("+44 2345678"),
            ],
        ),
    },
)

§Jiter Example

To use Jiter, you need to know what schema you’re expecting:

use jiter::{Jiter, NumberInt, Peek};

let json_data = r#"
    {
        "name": "John Doe",
        "age": 43,
        "phones": [
            "+44 1234567",
            "+44 2345678"
        ]
    }"#;
let mut jiter = Jiter::new(json_data.as_bytes()).with_allow_inf_nan();
assert_eq!(jiter.next_object().unwrap(), Some("name"));
assert_eq!(jiter.next_str().unwrap(), "John Doe");
assert_eq!(jiter.next_key().unwrap(), Some("age"));
assert_eq!(jiter.next_int().unwrap(), NumberInt::Int(43));
assert_eq!(jiter.next_key().unwrap(), Some("phones"));
assert_eq!(jiter.next_array().unwrap(), Some(Peek::String));
// we know the next value is a string as we just asserted so
assert_eq!(jiter.known_str().unwrap(), "+44 1234567");
assert_eq!(jiter.array_step().unwrap(), Some(Peek::String));
// same again
assert_eq!(jiter.known_str().unwrap(), "+44 2345678");
// next we'll get `None` from `array_step` as the array is finished
assert_eq!(jiter.array_step().unwrap(), None);
// and `None` from `next_key` as the object is finished
assert_eq!(jiter.next_key().unwrap(), None);
// and we check there's nothing else in the input
jiter.finish().unwrap();

§Benchmarks

There are lies, damned lies and benchmarks.

In particular, serde-json benchmarks use serde_json::Value which is significantly slower than deserializing to a string.

For more details, see the benchmarks.

running 48 tests
test big_jiter_iter                    ... bench:   3,662,616 ns/iter (+/- 88,878)
test big_jiter_value                   ... bench:   6,998,605 ns/iter (+/- 292,383)
test big_serde_value                   ... bench:  29,793,191 ns/iter (+/- 576,173)
test bigints_array_jiter_iter          ... bench:      11,836 ns/iter (+/- 414)
test bigints_array_jiter_value         ... bench:      28,979 ns/iter (+/- 938)
test bigints_array_serde_value         ... bench:     129,797 ns/iter (+/- 5,096)
test floats_array_jiter_iter           ... bench:      19,302 ns/iter (+/- 631)
test floats_array_jiter_value          ... bench:      31,083 ns/iter (+/- 921)
test floats_array_serde_value          ... bench:     208,932 ns/iter (+/- 6,167)
test lazy_map_lookup_1_10              ... bench:         615 ns/iter (+/- 15)
test lazy_map_lookup_2_20              ... bench:       1,776 ns/iter (+/- 36)
test lazy_map_lookup_3_50              ... bench:       4,291 ns/iter (+/- 77)
test massive_ints_array_jiter_iter     ... bench:      62,244 ns/iter (+/- 1,616)
test massive_ints_array_jiter_value    ... bench:      82,889 ns/iter (+/- 1,916)
test massive_ints_array_serde_value    ... bench:     498,650 ns/iter (+/- 47,759)
test medium_response_jiter_iter        ... bench:           0 ns/iter (+/- 0)
test medium_response_jiter_value       ... bench:       3,521 ns/iter (+/- 101)
test medium_response_jiter_value_owned ... bench:       6,088 ns/iter (+/- 180)
test medium_response_serde_value       ... bench:       9,383 ns/iter (+/- 342)
test pass1_jiter_iter                  ... bench:           0 ns/iter (+/- 0)
test pass1_jiter_value                 ... bench:       3,048 ns/iter (+/- 79)
test pass1_serde_value                 ... bench:       6,588 ns/iter (+/- 232)
test pass2_jiter_iter                  ... bench:         384 ns/iter (+/- 9)
test pass2_jiter_value                 ... bench:       1,259 ns/iter (+/- 44)
test pass2_serde_value                 ... bench:       1,237 ns/iter (+/- 38)
test sentence_jiter_iter               ... bench:         283 ns/iter (+/- 10)
test sentence_jiter_value              ... bench:         357 ns/iter (+/- 15)
test sentence_serde_value              ... bench:         428 ns/iter (+/- 9)
test short_numbers_jiter_iter          ... bench:           0 ns/iter (+/- 0)
test short_numbers_jiter_value         ... bench:      18,085 ns/iter (+/- 613)
test short_numbers_serde_value         ... bench:      87,253 ns/iter (+/- 1,506)
test string_array_jiter_iter           ... bench:         615 ns/iter (+/- 18)
test string_array_jiter_value          ... bench:       1,410 ns/iter (+/- 44)
test string_array_jiter_value_owned    ... bench:       2,863 ns/iter (+/- 151)
test string_array_serde_value          ... bench:       3,467 ns/iter (+/- 60)
test true_array_jiter_iter             ... bench:         299 ns/iter (+/- 8)
test true_array_jiter_value            ... bench:         995 ns/iter (+/- 29)
test true_array_serde_value            ... bench:       1,207 ns/iter (+/- 36)
test true_object_jiter_iter            ... bench:       2,482 ns/iter (+/- 84)
test true_object_jiter_value           ... bench:       2,058 ns/iter (+/- 45)
test true_object_serde_value           ... bench:       7,991 ns/iter (+/- 370)
test unicode_jiter_iter                ... bench:         315 ns/iter (+/- 7)
test unicode_jiter_value               ... bench:         389 ns/iter (+/- 6)
test unicode_serde_value               ... bench:         445 ns/iter (+/- 6)
test x100_jiter_iter                   ... bench:          12 ns/iter (+/- 0)
test x100_jiter_value                  ... bench:          20 ns/iter (+/- 1)
test x100_serde_iter                   ... bench:          72 ns/iter (+/- 3)
test x100_serde_value                  ... bench:          83 ns/iter (+/- 3)

Structs§

Jiter
A JSON iterator.
JiterError
An error from the Jiter iterator.
JsonError
Represents an error from parsing JSON
LinePosition
Represents a line and column in a file or input string, used for both errors and value positions.
LosslessFloat
Represents a float from JSON, by holding the underlying bytes representing a float from JSON.
Peek
PythonParse

Enums§

FloatMode
JiterErrorType
Enum representing either a JsonErrorType or a WrongType error.
JsonErrorType
Enum representing all possible errors in JSON syntax.
JsonType
Enum representing all JSON types.
JsonValue
Enum representing a JSON value.
NumberAny
A number that can be either a NumberInt or an f64
NumberInt
A number that can be either an i64 or a BigInt
PartialMode
StringCacheMode

Functions§

cache_clear
cache_usage
cached_py_string
Create a cached Python str from a string slice
cached_py_string_ascii
Create a cached Python str from a string slice.
map_json_error
Map a JsonError to a PyErr which can be raised as an exception in Python as a ValueError.
pystring_ascii_new
Faster creation of PyString from an ASCII string, inspired by https://github.com/ijl/orjson/blob/3.10.0/src/str/create.rs#L41

Type Aliases§

JiterResult
JsonArray
Parsed JSON array.
JsonObject
Parsed JSON object. Note that jiter does not attempt to deduplicate keys, so it is possible that the key occurs multiple times in the object.
JsonResult