pub trait ObjectStoreRegistry: Send + Sync + Debug + 'static {
    // Required methods
    fn register_store(
        &self,
        url: &Url,
        store: Arc<dyn ObjectStore>
    ) -> Option<Arc<dyn ObjectStore>>;
    fn get_store(
        &self,
        url: &Url
    ) -> Result<Arc<dyn ObjectStore>, DataFusionError>;
}
Expand description

ObjectStoreRegistry maps a URL to an ObjectStore instance, and allows DataFusion to read from different ObjectStore instances. For example DataFusion might be configured so that

  1. s3://my_bucket/lineitem/ mapped to the /lineitem path on an AWS S3 object store bound to my_bucket

  2. s3://my_other_bucket/lineitem/ mapped to the (same) /lineitem path on a different AWS S3 object store bound to my_other_bucket

When given a ListingTableUrl, DataFusion tries to find an appropriate ObjectStore. For example

create external table unicorns stored as parquet location 's3://my_bucket/lineitem/';

In this particular case, the url s3://my_bucket/lineitem/ will be provided to ObjectStoreRegistry::get_store and one of three things will happen:

  • If an ObjectStore has been registered with ObjectStoreRegistry::register_store with s3://my_bucket, that ObjectStore will be returned

  • If an AWS S3 object store can be ad-hoc discovered by the url s3://my_bucket/lineitem/, this object store will be registered with key s3://my_bucket and returned.

  • Otherwise an error will be returned, indicating that no suitable ObjectStore could be found

This allows for two different use-cases:

  1. Systems where object store buckets are explicitly created using DDL, can register these buckets using ObjectStoreRegistry::register_store

  2. Systems relying on ad-hoc discovery, without corresponding DDL, can create ObjectStore lazily by providing a custom implementation of ObjectStoreRegistry

Required Methods§

source

fn register_store( &self, url: &Url, store: Arc<dyn ObjectStore> ) -> Option<Arc<dyn ObjectStore>>

If a store with the same key existed before, it is replaced and returned

source

fn get_store(&self, url: &Url) -> Result<Arc<dyn ObjectStore>, DataFusionError>

Get a suitable store for the provided URL. For example:

  • URL with scheme file:/// or no scheme will return the default LocalFS store
  • URL with scheme s3://bucket/ will return the S3 store
  • URL with scheme hdfs://hostname:port/ will return the hdfs store

If no ObjectStore found for the url, ad-hoc discovery may be executed depending on the url and ObjectStoreRegistry implementation. An ObjectStore may be lazily created and registered.

Implementors§

source§

impl ObjectStoreRegistry for DefaultObjectStoreRegistry

Stores are registered based on the scheme, host and port of the provided URL with a LocalFileSystem::new automatically registered for file:// (if the target arch is not wasm32).

For example:

  • file:///my_path will return the default LocalFS store
  • s3://bucket/path will return a store registered with s3://bucket if any
  • hdfs://host:port/path will return a store registered with hdfs://host:port if any