JsonHandler

Trait JsonHandler 

Source
pub trait JsonHandler: AsAny {
    // Required methods
    fn parse_json(
        &self,
        json_strings: Box<dyn EngineData>,
        output_schema: SchemaRef,
    ) -> DeltaResult<Box<dyn EngineData>>;
    fn read_json_files(
        &self,
        files: &[FileMeta],
        physical_schema: SchemaRef,
        predicate: Option<PredicateRef>,
    ) -> DeltaResult<FileDataReadResultIterator>;
    fn write_json_file(
        &self,
        path: &Url,
        data: Box<dyn Iterator<Item = DeltaResult<FilteredEngineData>> + Send + '_>,
        overwrite: bool,
    ) -> DeltaResult<()>;
}
Expand description

Provides JSON handling functionality to Delta Kernel.

Delta Kernel can use this handler to parse JSON strings into Row or read content from JSON files. Connectors can leverage this trait to provide their best implementation of the JSON parsing capability to Delta Kernel.

Required Methods§

Source

fn parse_json( &self, json_strings: Box<dyn EngineData>, output_schema: SchemaRef, ) -> DeltaResult<Box<dyn EngineData>>

Parse the given json strings and return the fields requested by output schema as columns in EngineData. json_strings MUST be a single column batch of engine data, and the column type must be string

Source

fn read_json_files( &self, files: &[FileMeta], physical_schema: SchemaRef, predicate: Option<PredicateRef>, ) -> DeltaResult<FileDataReadResultIterator>

Read and parse the JSON format file at given locations and return the data as EngineData with the columns requested by physical schema. Note: The FileDataReadResultIterator must emit data from files in the order that files is given. For example if files [“a”, “b”] is provided, then the engine data iterator must first return all the engine data from file “a”, then all the engine data from file “b”. Moreover, for a given file, all of its EngineData and constituent rows must be in order that they occur in the file. Consider a file with rows (1, 2, 3). The following are legal iterator batches: iter: [EngineData(1, 2), EngineData(3)] iter: [EngineData(1), EngineData(2, 3)] iter: [EngineData(1, 2, 3)] The following are illegal batches: iter: [EngineData(3), EngineData(1, 2)] iter: [EngineData(1), EngineData(3, 2)] iter: [EngineData(2, 1, 3)]

§Parameters
  • files - File metadata for files to be read.
  • physical_schema - Select list of columns to read from the JSON file.
  • predicate - Optional push-down predicate hint (engine is free to ignore it).
Source

fn write_json_file( &self, path: &Url, data: Box<dyn Iterator<Item = DeltaResult<FilteredEngineData>> + Send + '_>, overwrite: bool, ) -> DeltaResult<()>

Atomically (!) write a single JSON file. Each row of the input data should be written as a new JSON object appended to the file. this write must: (1) serialize the data to newline-delimited json (each row is a json object literal) (2) write the data to storage atomically (i.e. if the file already exists, fail unless the overwrite flag is set)

For example, the JSON data should be written as { “column1”: “val1”, “column2”: “val2”, .. } with each row on a new line.

NOTE: Null columns should not be written to the JSON file. For example, if a row has columns [“a”, “b”] and the value of “b” is null, the JSON object should be written as { “a”: “…” }. Note that including nulls is technically valid JSON, but would bloat the log, therefore we recommend omitting them.

§Parameters
  • path - URL specifying the location to write the JSON file
  • data - Iterator of EngineData to write to the JSON file. Each row should be written as a new JSON object appended to the file. (that is, the file is newline-delimited JSON, and each row is a JSON object on a single line)
  • overwrite - If true, overwrite the file if it exists. If false, the call must fail if the file exists.

Implementors§

Source§

impl<E: TaskExecutor> JsonHandler for DefaultJsonHandler<E>

Available on crate feature default-engine-base and (crate features default-engine-native-tls or default-engine-rustls or arrow-conversion) only.