pub trait FileSink: DataSink {
// Required methods
fn config(&self) -> &FileSinkConfig;
fn spawn_writer_tasks_and_join<'life0, 'life1, 'async_trait>(
&'life0 self,
context: &'life1 Arc<TaskContext>,
demux_task: SpawnedTask<Result<()>>,
file_stream_rx: DemuxedStreamReceiver,
object_store: Arc<dyn ObjectStore>,
) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>
where Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait;
// Provided method
fn write_all<'life0, 'life1, 'async_trait>(
&'life0 self,
data: SendableRecordBatchStream,
context: &'life1 Arc<TaskContext>,
) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>
where Self: Sync + 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait { ... }
}
Expand description
General behaviors for files that do DataSink
operations
Required Methods§
Sourcefn config(&self) -> &FileSinkConfig
fn config(&self) -> &FileSinkConfig
Retrieves the file sink configuration.
Sourcefn spawn_writer_tasks_and_join<'life0, 'life1, 'async_trait>(
&'life0 self,
context: &'life1 Arc<TaskContext>,
demux_task: SpawnedTask<Result<()>>,
file_stream_rx: DemuxedStreamReceiver,
object_store: Arc<dyn ObjectStore>,
) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn spawn_writer_tasks_and_join<'life0, 'life1, 'async_trait>(
&'life0 self,
context: &'life1 Arc<TaskContext>,
demux_task: SpawnedTask<Result<()>>,
file_stream_rx: DemuxedStreamReceiver,
object_store: Arc<dyn ObjectStore>,
) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Spawns writer tasks and joins them to perform file writing operations.
Is a critical part of FileSink
trait, since it’s the very last step for write_all
.
This function handles the process of writing data to files by:
- Spawning tasks for writing data to individual files.
- Coordinating the tasks using a demuxer to distribute data among files.
- Collecting results using
tokio::join
, ensuring that all tasks complete successfully.
§Parameters
context
: The execution context (TaskContext
) that provides resources like memory management and runtime environment.demux_task
: A spawned task that handles demuxing, responsible for splitting an inputSendableRecordBatchStream
into dynamically determined partitions. Seestart_demuxer_task()
file_stream_rx
: A receiver that yields streams of record batches and their corresponding file paths for writing. Seestart_demuxer_task()
object_store
: A handle to the object store where the files are written.
§Returns
Result<u64>
: Returns the total number of rows written across all files.
Provided Methods§
Sourcefn write_all<'life0, 'life1, 'async_trait>(
&'life0 self,
data: SendableRecordBatchStream,
context: &'life1 Arc<TaskContext>,
) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>where
Self: Sync + 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn write_all<'life0, 'life1, 'async_trait>(
&'life0 self,
data: SendableRecordBatchStream,
context: &'life1 Arc<TaskContext>,
) -> Pin<Box<dyn Future<Output = Result<u64>> + Send + 'async_trait>>where
Self: Sync + 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
File sink implementation of the DataSink::write_all
method.