Streaming Serde JSON
This project is still experimental and not intended for production use. It has not been optimized to use any hardware capabilities like SIMD. The main serde_json crate is a better fit for all non-streaming use cases.
Goal
This library aims to solve a very simple to understand problem. JSON files can be larger than the memory limitations of the machine that is processing the data. As an example, if I have a file with 300 GB of JSON u64 values as input from a hardware device and am working on a machine with only 64 GB of RAM, that will not fit into my machine's memory. Assuming that JSON is well-formed and is simply in an extremely large array or dictionary, using the standard serde_json crate will not suffice. The standard serde_json crate will eagerly parse all 300 GB of values. This crate provides methods and iterator implementations for lazy-parsing JSON.
The Public API
This crate only exposes 4 functions. Each returns an iterator that performs the desired conversion (either character stream to value stream or value stream to JSON stream).
Functions:
- from_key_value_pair_stream - takes in a stream of characters and parses it as a single JSON object. The key value pairs will be returned one at a time from the returned iterator.
- from_value_stream - takes in a stream of characters and parses it as a single JSON array. The values will be returned one at a time from the returned iterator.
- key_value_pairs_to_json_stream - takes in an iterator of serializable values and returns an iterator of JSON string segments. These segments can then be incrementally written to an output target (probably a file or database in most cases).
- values_to_json_stream - takes in an iterator of key value pairs and returns an iterator of JSON string segments. These segments can then be incrementally written to an output target (probably a file or database in most cases).
Examples
Values API
The simplest example is reading from a file, aggregating, and writing to a file.
Let's say I have a 300 GB stream of object values where each object holds the
min/max and average over a 5 minute window of data. This is placed in
myFile.json, which might look something like this:
If I want to find the global minimum and maximum values in this stream, I could do that with the following code:
use ;
use ;
use BufReadCharsExt;
use ;
use ;
let mut reader =
new;
let chars = reader.chars;
// The value stream has a PassThroughError type parameter since most
// sources with data large enough to need this will be fallible to
// read from.
let values = ;
let mut global_min = i32MAX;
let mut global_max = i32MIN;
for result in values
let output_data = OutputData ;
// Since the output data will be only one item, using the
// buffered writers and streaming JSON output functions
// is overkill, but I am using them to demonstrate their
// usage.
let output_file = new
.read
.write
.create
.truncate
.open
.unwrap;
let mut writer = new;
// I am using an array, but the values_to_json_stream
// function will accept any type that implements IntoIterator
// and has an Item type that implements Serialize.
let iter = ;
for str in values_to_json_stream
writer.flush.unwrap;
Key Value Pairs API
The simplest example is again, reading from a file, making some modification, and writing to a file.
Let's say I have a 300 GB stream of key value pairs where each value holds the
population of a city in the world. This is placed in worldPopulations.json,
which might look something like this:
If I want to find the global average population of a city, I could do that with the following code:
use ;
use ;
use BufReadCharsExt;
use ;
use ;
let mut reader =
new;
let chars = reader.chars;
// The value stream has a PassThroughError type parameter since most
// sources with data large enough to need this API will be fallible to
// read from.
let values = ;
let mut global_total = 0_u128;
let mut city_count = 0_u32;
for result in values
let global_average = global_total / city_count as u128;
// Since the output data will be only one item, using the
// buffered writers and streaming JSON output functions
// is overkill, but I am using them to demonstrate their
// usage.
let output_file = new
.read
.write
.create
.truncate
.open
.unwrap;
let mut writer = new;
// I am using an array, but the values_to_json_stream
// function will accept any type that implements IntoIterator
// and has an Item type that implements Serialize.
let iter = ;
for str in key_value_pairs_to_json_stream
writer.flush.unwrap;