[][src]Trait dbcrossbarlib::Locator

pub trait Locator: Debug + Display + Send + Sync + 'static {
    fn as_any(&self) -> &dyn Any;

    fn schema(&self, _ctx: Context) -> BoxFuture<Option<Table>> { ... }
fn write_schema(
        &self,
        _ctx: Context,
        _schema: Table,
        _if_exists: IfExists
    ) -> BoxFuture<()> { ... }
fn local_data(
        &self,
        _ctx: Context,
        _shared_args: SharedArguments<Unverified>,
        _source_args: SourceArguments<Unverified>
    ) -> BoxFuture<Option<BoxStream<CsvStream>>> { ... }
fn display_output_locators(&self) -> DisplayOutputLocators { ... }
fn write_local_data(
        &self,
        _ctx: Context,
        _data: BoxStream<CsvStream>,
        _shared_args: SharedArguments<Unverified>,
        _dest_args: DestinationArguments<Unverified>
    ) -> BoxFuture<BoxStream<BoxFuture<BoxLocator>>> { ... }
fn supports_write_remote_data(&self, _source: &dyn Locator) -> bool { ... }
fn write_remote_data(
        &self,
        _ctx: Context,
        source: BoxLocator,
        _shared_args: SharedArguments<Unverified>,
        _source_args: SourceArguments<Unverified>,
        _dest_args: DestinationArguments<Unverified>
    ) -> BoxFuture<Vec<BoxLocator>> { ... } }

Specify the the location of data or a schema.

Required methods

fn as_any(&self) -> &dyn Any

Provide a mechanism for casting a dyn Locator back to the underlying, concrete locator type using Rust's Any type.

See this StackOverflow question for a discussion of the technical details, and why we need a Locator::as_any method to use Any.

This is a bit of a sketchy feature to provide, but we provide it for use with supports_write_remote_data and write_remote_data, which are used for certain locator pairs (i.e., Google Cloud Storage and BigQuery) to bypass our normal local_data and write_local_data transfers and use an external, optimized transfer method (such as direct loads from Google Cloud Storage into BigQuery).

This should always be implemented as follows:

impl Locator for MyLocator {
    fn as_any(&self) -> &dyn Any {
        self
    }
}
Loading content...

Provided methods

fn schema(&self, _ctx: Context) -> BoxFuture<Option<Table>>

Return a table schema, if available.

fn write_schema(
    &self,
    _ctx: Context,
    _schema: Table,
    _if_exists: IfExists
) -> BoxFuture<()>

Write a table schema to this locator, if that's the sort of thing that we can do.

fn local_data(
    &self,
    _ctx: Context,
    _shared_args: SharedArguments<Unverified>,
    _source_args: SourceArguments<Unverified>
) -> BoxFuture<Option<BoxStream<CsvStream>>>

If this locator can be used as a local data source, return a stream of CSV streams. This function type is bit hairy:

  1. The outermost BoxFuture is essentially an async Result, returning either a value or an error. It's boxed because we don't know what concrete type it will actually be, just that it will implement Future.
  2. The Option will be None if we have no local data, or Some if we can provide one or more CSV streams.
  3. The BoxStream returns a "stream of streams". This could be a Vec<CsvStream>, but that would force us to, say, open up hundreds of CSV files or S3 objects at once, causing us to run out of file descriptors. By returning a stream, we allow our caller to open up files or start downloads only when needed.
  4. The innermost CsvStream is a stream of raw CSV data plus some other information, like the original filename.

fn display_output_locators(&self) -> DisplayOutputLocators

Should we display the individual output locations?

fn write_local_data(
    &self,
    _ctx: Context,
    _data: BoxStream<CsvStream>,
    _shared_args: SharedArguments<Unverified>,
    _dest_args: DestinationArguments<Unverified>
) -> BoxFuture<BoxStream<BoxFuture<BoxLocator>>>

If this locator can be used as a local data sink, write data to it.

This function takes a stream data as input, the elements of which are individual CsvStream values. An implementation should normally use map or and_then to write those CSV streams to storage associated with the locator, and return a stream of BoxFuture<()> values:

# Pseudo code for parallel output.
data.map(async |csv_stream| {
    write(csv_stream).await?;
    Ok(())
})

For cases where output must be serialized, it's OK to consume the entire data stream, and return a single-item stream containing ().

The caller of write_local_data will pull several items at a time from the returned BoxStream<BoxFuture<()>> and evaluate them in parallel.

fn supports_write_remote_data(&self, _source: &dyn Locator) -> bool

Can we access the data at source directly using write_remote_data?

fn write_remote_data(
    &self,
    _ctx: Context,
    source: BoxLocator,
    _shared_args: SharedArguments<Unverified>,
    _source_args: SourceArguments<Unverified>,
    _dest_args: DestinationArguments<Unverified>
) -> BoxFuture<Vec<BoxLocator>>

Take the data at source, and write to this locator directly, without passing it through the local system.

This is used to bypass source.local_data and dest.write_local_data when we don't need them.

Loading content...

Implementors

impl Locator for BigQueryLocator[src]

impl Locator for BigQuerySchemaLocator[src]

impl Locator for DbcrossbarSchemaLocator[src]

impl Locator for PostgresLocator[src]

impl Locator for PostgresSqlLocator[src]

impl Locator for RedshiftLocator[src]

Loading content...