pub trait DFExtensionType:
Debug
+ Send
+ Sync {
// Required methods
fn storage_type(&self) -> DataType;
fn serialize_metadata(&self) -> Option<String>;
// Provided method
fn create_array_formatter<'fmt>(
&self,
_array: &'fmt dyn Array,
_options: &FormatOptions<'fmt>,
) -> Result<Option<ArrayFormatter<'fmt>>> { ... }
}Expand description
Represents an implementation of a DataFusion extension type, including the storage DataType.
While, in general, an extension type can support several different storage types, a specific
instance of it is always locked into just one exact storage type and metadata pairing.
This trait allows users to customize the behavior of DataFusion for certain types. Having this ability is necessary because extension types affect how columns should be treated by the query engine. This effect includes which operations are possible on a column and what are the expected results from these operations. The extension type mechanism allows users to define how these operations apply to a particular extension type.
For example, adding two values of Int64 is a sensible
thing to do. However, if the same column is annotated with an extension type like custom.id,
the correct interpretation of a column changes. Adding together two custom.id values, even
though they are stored as integers, may no longer make sense.
Note that DataFusion’s extension type support is still young and therefore might not cover all relevant use cases. Currently, the following operations can be customized:
- Pretty-printing values in record batches
§Relation to Arrow’s ExtensionType
The purpose of Arrow’s ExtensionType trait, for the
time being, is to allow reading and writing extension type metadata in a type-safe manner. The
trait does not provide any customization options. Therefore, downstream users (such as
DataFusion) have the flexibility to implement the extension type mechanism according to their
needs. DFExtensionType is DataFusion’s implementation of this extension type mechanism.
Furthermore, the current trait in arrow-rs is not dyn-compatible, which we need for implementing extension type registries. In the future, the two implementations may increasingly converge.
Another difference is that DFExtensionType represents a fully resolved extension type that
has a fixed storage type (i.e., DataType). This is different from arrow-rs, which only
stores the extension type’s metadata. For example, an instance of DataFusion’s JSON extension
type fixes one of the three possible storage types: DataType::Utf8,
DataType::LargeUtf8, or DataType::Utf8View. This fixed storaga type is returned in
DFExtensionType::storage_type. This is not possible in arrow-rs’ extension type instances.
This is the reason why we have different types in DataFusion that usually delegate the metadata
processing to the underlying arrow-rs extension type instance
(e.g., DFJson instead of Json).
§Examples
Examples for using the extension type machinery can be found in the DataFusion examples directory.
Required Methods§
Sourcefn storage_type(&self) -> DataType
fn storage_type(&self) -> DataType
Returns the underlying storage type.
Sourcefn serialize_metadata(&self) -> Option<String>
fn serialize_metadata(&self) -> Option<String>
Returns the serialized metadata.
Provided Methods§
Sourcefn create_array_formatter<'fmt>(
&self,
_array: &'fmt dyn Array,
_options: &FormatOptions<'fmt>,
) -> Result<Option<ArrayFormatter<'fmt>>>
fn create_array_formatter<'fmt>( &self, _array: &'fmt dyn Array, _options: &FormatOptions<'fmt>, ) -> Result<Option<ArrayFormatter<'fmt>>>
Returns an ArrayFormatter that can format values of this type.
If Ok(None) is returned, the default implementation will be used.
If an error is returned, there was an error creating the formatter.
Dyn Compatibility§
This trait is dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".