pub struct FileGroupPartitioner { /* private fields */ }Expand description
Repartition input files into target_partitions partitions, if total file size exceed
repartition_file_min_size
This partitions evenly by file byte range, and does not have any knowledge
of how data is laid out in specific files. The specific FileOpener are
responsible for the actual partitioning on specific data source type. (e.g.
the CsvOpener will read lines overlap with byte range as well as
handle boundaries to ensure all lines will be read exactly once)
§Example
For example, if there are two files A and B that we wish to read with 4
partitions (with 4 threads) they will be divided as follows:
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
┌─────────────────┐
│ │ │ │
│ File A │
│ │ Range: 0-2MB │ │
│ │
│ └─────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌─────────────────┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range 2-4MB │ │
│ │ │ │
│ │ │ └─────────────────┘ │
│ File A (7MB) │ ────────▶ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range: 4-6MB │ │
│ │ │ │
│ │ │ └─────────────────┘ │
└─────────────────┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
┌─────────────────┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ File B (1MB) │ ┌─────────────────┐
│ │ │ │ File A │ │
└─────────────────┘ │ Range: 6-7MB │
│ └─────────────────┘ │
┌─────────────────┐
│ │ File B (1MB) │ │
│ │
│ └─────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
If target_partitions = 4,
divides into 4 groups§Maintaining Order
Within each group files are read sequentially. Thus, if the overall order of tuples must be preserved, multiple files can not be mixed in the same group.
In this case, the code will split the largest files evenly into any available empty groups, but the overall distribution may not be as even as if the order did not need to be preserved.
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
┌─────────────────┐
│ │ │ │
│ File A │
│ │ Range: 0-2MB │ │
│ │
┌─────────────────┐ │ └─────────────────┘ │
│ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range 2-4MB │ │
│ File A (6MB) │ ────────▶ │ │
│ (ordered) │ │ └─────────────────┘ │
│ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ │ ┌─────────────────┐
│ │ │ │ │ │
│ │ │ File A │
│ │ │ │ Range: 4-6MB │ │
└─────────────────┘ │ │
┌─────────────────┐ │ └─────────────────┘ │
│ File B (1MB) │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ (ordered) │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
└─────────────────┘ ┌─────────────────┐
│ │ File B (1MB) │ │
│ │
│ └─────────────────┘ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
If target_partitions = 4,
divides into 4 groupsImplementations§
Source§impl FileGroupPartitioner
impl FileGroupPartitioner
Sourcepub fn new() -> Self
pub fn new() -> Self
Creates a new FileGroupPartitioner with default values:
target_partitions = 1repartition_file_min_size = 10MBpreserve_order_within_groups = false
Sourcepub fn with_target_partitions(self, target_partitions: usize) -> Self
pub fn with_target_partitions(self, target_partitions: usize) -> Self
Set the target partitions
Sourcepub fn with_repartition_file_min_size(
self,
repartition_file_min_size: usize,
) -> Self
pub fn with_repartition_file_min_size( self, repartition_file_min_size: usize, ) -> Self
Set the minimum size at which to repartition a file
Sourcepub fn with_preserve_order_within_groups(
self,
preserve_order_within_groups: bool,
) -> Self
pub fn with_preserve_order_within_groups( self, preserve_order_within_groups: bool, ) -> Self
Set whether the order of tuples within a file must be preserved
Sourcepub fn repartition_file_groups(
&self,
file_groups: &[FileGroup],
) -> Option<Vec<FileGroup>>
pub fn repartition_file_groups( &self, file_groups: &[FileGroup], ) -> Option<Vec<FileGroup>>
Repartition input files according to the settings on this FileGroupPartitioner.
If no repartitioning is needed or possible, return None.
Trait Implementations§
Source§impl Clone for FileGroupPartitioner
impl Clone for FileGroupPartitioner
Source§fn clone(&self) -> FileGroupPartitioner
fn clone(&self) -> FileGroupPartitioner
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for FileGroupPartitioner
impl Debug for FileGroupPartitioner
Source§impl Default for FileGroupPartitioner
impl Default for FileGroupPartitioner
impl Copy for FileGroupPartitioner
Auto Trait Implementations§
impl Freeze for FileGroupPartitioner
impl RefUnwindSafe for FileGroupPartitioner
impl Send for FileGroupPartitioner
impl Sync for FileGroupPartitioner
impl Unpin for FileGroupPartitioner
impl UnwindSafe for FileGroupPartitioner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more