Trait FileSink

Source
pub trait FileSink: DataSink {
    // Required methods
    fn config(&self) -> &FileSinkConfig;
    fn spawn_writer_tasks_and_join<'life0, 'life1, 'async_trait>(
        &'life0 self,
        context: &'life1 Arc<TaskContext>,
        demux_task: SpawnedTask<Result<()>>,
        file_stream_rx: DemuxedStreamReceiver,
        object_store: Arc<dyn ObjectStore>,
    ) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>
       where Self: 'async_trait,
             'life0: 'async_trait,
             'life1: 'async_trait;

    // Provided method
    fn write_all<'life0, 'life1, 'async_trait>(
        &'life0 self,
        data: SendableRecordBatchStream,
        context: &'life1 Arc<TaskContext>,
    ) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>
       where Self: Sync + 'async_trait,
             'life0: 'async_trait,
             'life1: 'async_trait { ... }
}
Expand description

General behaviors for files that do DataSink operations

Required Methods§

Source

fn config(&self) -> &FileSinkConfig

Retrieves the file sink configuration.

Source

fn spawn_writer_tasks_and_join<'life0, 'life1, 'async_trait>( &'life0 self, context: &'life1 Arc<TaskContext>, demux_task: SpawnedTask<Result<()>>, file_stream_rx: DemuxedStreamReceiver, object_store: Arc<dyn ObjectStore>, ) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Spawns writer tasks and joins them to perform file writing operations. Is a critical part of FileSink trait, since it’s the very last step for write_all.

This function handles the process of writing data to files by:

  1. Spawning tasks for writing data to individual files.
  2. Coordinating the tasks using a demuxer to distribute data among files.
  3. Collecting results using tokio::join, ensuring that all tasks complete successfully.
§Parameters
  • context: The execution context (TaskContext) that provides resources like memory management and runtime environment.
  • demux_task: A spawned task that handles demuxing, responsible for splitting an input SendableRecordBatchStream into dynamically determined partitions. See start_demuxer_task()
  • file_stream_rx: A receiver that yields streams of record batches and their corresponding file paths for writing. See start_demuxer_task()
  • object_store: A handle to the object store where the files are written.
§Returns
  • Result<u64>: Returns the total number of rows written across all files.

Provided Methods§

Source

fn write_all<'life0, 'life1, 'async_trait>( &'life0 self, data: SendableRecordBatchStream, context: &'life1 Arc<TaskContext>, ) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>
where Self: Sync + 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

File sink implementation of the DataSink::write_all method.

Implementors§