Skip to main content

JsonFormat

Struct JsonFormat 

Source
pub struct JsonFormat { /* private fields */ }
Expand description

JSON FileFormat implementation supporting both line-delimited and array formats.

§Supported Formats

§Line-Delimited JSON (default, newline_delimited = true)

{"key1": 1, "key2": "val"}
{"key1": 2, "key2": "vals"}

§JSON Array Format (newline_delimited = false)

[
    {"key1": 1, "key2": "val"},
    {"key1": 2, "key2": "vals"}
]

Note: JSON array format is processed using streaming conversion, which is memory-efficient even for large files.

Implementations§

Source§

impl JsonFormat

Source

pub fn with_options(self, options: JsonOptions) -> Self

Set JSON options

Source

pub fn options(&self) -> &JsonOptions

Retrieve JSON options

Source

pub fn with_schema_infer_max_rec(self, max_rec: usize) -> Self

Set a limit in terms of records to scan to infer the schema

  • defaults to DEFAULT_SCHEMA_INFER_MAX_RECORD
Source

pub fn with_file_compression_type( self, file_compression_type: FileCompressionType, ) -> Self

Set a FileCompressionType of JSON

  • defaults to FileCompressionType::UNCOMPRESSED
Source

pub fn with_newline_delimited(self, newline_delimited: bool) -> Self

Set whether to read as newline-delimited JSON (NDJSON).

When true (default), expects newline-delimited format:

{"a": 1}
{"a": 2}

When false, expects JSON array format:

[{"a": 1}, {"a": 2}]
Source

pub fn is_newline_delimited(&self) -> bool

Returns whether this format expects newline-delimited JSON.

Trait Implementations§

Source§

impl Debug for JsonFormat

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for JsonFormat

Source§

fn default() -> JsonFormat

Returns the “default value” for a type. Read more
Source§

impl FileFormat for JsonFormat

Source§

fn as_any(&self) -> &dyn Any

Returns the table provider as Any so that it can be downcast to a specific implementation.
Source§

fn get_ext(&self) -> String

Returns the extension for this FileFormat, e.g. “file.csv” -> csv
Source§

fn get_ext_with_compression( &self, file_compression_type: &FileCompressionType, ) -> Result<String>

Returns the extension for this FileFormat when compressed, e.g. “file.csv.gz” -> csv
Source§

fn compression_type(&self) -> Option<FileCompressionType>

Returns whether this instance uses compression if applicable
Source§

fn infer_schema<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, store: &'life2 Arc<dyn ObjectStore>, objects: &'life3 [ObjectMeta], ) -> Pin<Box<dyn Future<Output = Result<SchemaRef>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Infer the common schema of the provided objects. The objects will usually be analysed up to a given number of records or files (as specified in the format config) then give the estimated common schema. This might fail if the files have schemas that cannot be merged.
Source§

fn infer_stats<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, _store: &'life2 Arc<dyn ObjectStore>, table_schema: SchemaRef, _object: &'life3 ObjectMeta, ) -> Pin<Box<dyn Future<Output = Result<Statistics>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Infer the statistics for the provided object. The cost and accuracy of the estimated statistics might vary greatly between file formats. Read more
Source§

fn create_physical_plan<'life0, 'life1, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, conf: FileScanConfig, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Take a list of files and convert it to the appropriate executor according to this file format.
Source§

fn create_writer_physical_plan<'life0, 'life1, 'async_trait>( &'life0 self, input: Arc<dyn ExecutionPlan>, _state: &'life1 dyn Session, conf: FileSinkConfig, order_requirements: Option<LexRequirement>, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Take a list of files and the configuration to convert it to the appropriate writer executor according to this file format.
Source§

fn file_source(&self, table_schema: TableSchema) -> Arc<dyn FileSource>

Return the related FileSource such as CsvSource, JsonSource, etc. Read more
Source§

fn infer_ordering<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, _store: &'life2 Arc<dyn ObjectStore>, _table_schema: Arc<Schema>, _object: &'life3 ObjectMeta, ) -> Pin<Box<dyn Future<Output = Result<Option<LexOrdering>, DataFusionError>> + Send + 'async_trait>>
where 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait, Self: 'async_trait,

Infer the ordering (sort order) for the provided object from file metadata. Read more
Source§

fn infer_stats_and_ordering<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, state: &'life1 dyn Session, store: &'life2 Arc<dyn ObjectStore>, table_schema: Arc<Schema>, object: &'life3 ObjectMeta, ) -> Pin<Box<dyn Future<Output = Result<FileMeta, DataFusionError>> + Send + 'async_trait>>
where 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait, Self: 'async_trait,

Infer both statistics and ordering from a single metadata read. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,