arrow-json 57.1.0

Support for parsing JSON format to and from the Arrow format
Documentation

Transfer data between the Arrow memory format and JSON line-delimited records.

See the module level documentation for the [reader] and [writer] for usage examples.

Binary Data uses Base16 Encoding

As per RFC7159 JSON cannot encode arbitrary binary data. This crate works around that limitation by encoding/decoding binary data as a hexadecimal string (i.e. Base16 encoding).

Note that Base16 only has 50% space efficiency (i.e., the encoded data is twice as large as the original). If that is an issue, we recommend to convert binary data to/from a different encoding format such as Base64 instead. See the following example for details.

Base64 Encoding Example

Base64 is a common binary-to-text encoding scheme with a space efficiency of 75%. The following example shows how to use the [arrow_cast] crate to encode binary data to Base64 before converting it to JSON and how to decode it back.

# use std::io::Cursor;
# use std::sync::Arc;
# use arrow_array::{BinaryArray, RecordBatch, StringArray};
# use arrow_array::cast::AsArray;
use arrow_cast::base64::{b64_decode, b64_encode, BASE64_STANDARD};
# use arrow_json::{LineDelimitedWriter, ReaderBuilder};
#
// The data we want to write
let input = BinaryArray::from(vec![b"\xDE\x00\xFF".as_ref()]);

// Base64 encode it to a string
let encoded: StringArray = b64_encode(&BASE64_STANDARD, &input);

// Write the StringArray to JSON
let batch = RecordBatch::try_from_iter([("col", Arc::new(encoded) as _)]).unwrap();
let mut buf = Vec::with_capacity(1024);
let mut writer = LineDelimitedWriter::new(&mut buf);
writer.write(&batch).unwrap();
writer.finish().unwrap();

// Read the JSON data
let cursor = Cursor::new(buf);
let mut reader = ReaderBuilder::new(batch.schema()).build(cursor).unwrap();
let batch = reader.next().unwrap().unwrap();

// Reverse the base64 encoding
let col: BinaryArray = batch.column(0).as_string::<i32>().clone().into();
let output = b64_decode(&BASE64_STANDARD, &col).unwrap();

assert_eq!(input, output);