pub struct GenerationPlan { /* private fields */ }Expand description
A list of generator “parts” (data generator chunks, not TPCH parts) for a single output file.
Controls the parallelization and layout of Parquet files in tpchgen-cli.
§Background
A “part” is a logical partition of a particular output table. Each data generator can create parts individually.
For example, the parameters to OrderGenerator::new scale_factor, part_countandpart_counttogether define a partition of theOrder`
table.
The entire output table results from generating each of the part_count parts. For
example, if part_count is 10, appending parts 1 to 10 results in a
complete Order table.
Interesting properties of parts:
- They are independent of each other, so they can be generated in parallel.
- They scale. So for example, parts
0..10with apart_countof 50 will generate the same data as parts1with apart_countof 5.
§Implication for tpchgen-cli
For tbl and csv files, tpchgen-cli generates num-threads parts in
parallel.
For Parquet files, the output file has one row group for each “part”.
§Example
use tpchgen_cli::{GenerationPlan, OutputFormat, Table};
let plan = GenerationPlan::try_new(
Table::Orders,
OutputFormat::Parquet,
1.0, // scale factor
Some(-1), // cli_part
Some(-1), // cli_parts
0,
);
let results = plan.into_iter().collect::<Vec<_>>();
/// assert_eq!(results.len(), 1);Implementations§
Source§impl GenerationPlan
impl GenerationPlan
Sourcepub fn try_new(
table: Table,
format: OutputFormat,
scale_factor: f64,
cli_part: Option<i32>,
cli_part_count: Option<i32>,
parquet_row_group_bytes: i64,
) -> Result<Self, String>
pub fn try_new( table: Table, format: OutputFormat, scale_factor: f64, cli_part: Option<i32>, cli_part_count: Option<i32>, parquet_row_group_bytes: i64, ) -> Result<Self, String>
Returns a GenerationPlan number of parts to generate
§Arguments
cli_part: optional part number to generate (1-based),--partCLI argumentcli_part_count: optional total number of parts,--partsCLI argumentparquet_row_group_size: optional parquet row group size,--parquet-row-group-sizeCLI argument
Sourcepub fn partitioned_table(table: Table) -> bool
pub fn partitioned_table(table: Table) -> bool
Return true if the tables is unpartitionable (not parameterized by part count)
Sourcepub fn chunk_count(&self) -> usize
pub fn chunk_count(&self) -> usize
Return the number of part(ititions) this plan will generate
Trait Implementations§
Source§impl Clone for GenerationPlan
impl Clone for GenerationPlan
Source§fn clone(&self) -> GenerationPlan
fn clone(&self) -> GenerationPlan
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for GenerationPlan
impl Debug for GenerationPlan
Source§impl Display for GenerationPlan
impl Display for GenerationPlan
Source§impl IntoIterator for GenerationPlan
Converts the GenerationPlan into an iterator of (part_number, num_parts)
impl IntoIterator for GenerationPlan
Converts the GenerationPlan into an iterator of (part_number, num_parts)