pub struct ParquetCdcOptions {
pub enabled: bool,
pub min_chunk_size: usize,
pub max_chunk_size: usize,
pub norm_level: i32,
}Expand description
Options for content-defined chunking (CDC) when writing parquet files.
Mirrors parquet::file::properties::CdcOptions.
Carried as a ParquetCdcOptions in ParquetOptions::content_defined_chunking
with an explicit enabled flag, so it can be toggled with dotted config
keys (content_defined_chunking.enabled = true|false) and the result is
independent of the order in which the keys are set.
Fields§
§enabled: bool(writing) EXPERIMENTAL: Enable content-defined chunking (CDC) when writing parquet files. When enabled, parallel writing is automatically disabled since the chunker state must persist across row groups.
min_chunk_size: usizeMinimum chunk size in bytes. The rolling hash will not trigger a split until this many bytes have been accumulated. Default is 256 KiB.
max_chunk_size: usizeMaximum chunk size in bytes. A split is forced when the accumulated size exceeds this value. Default is 1 MiB.
norm_level: i32Normalization level. Increasing this improves deduplication ratio but increases fragmentation. Recommended range is [-3, 3], default is 0.
Implementations§
Source§impl ParquetCdcOptions
impl ParquetCdcOptions
Sourcepub fn enabled() -> Self
pub fn enabled() -> Self
Returns enabled CDC options with the default chunking parameters.
Shorthand for ParquetCdcOptions { enabled: true, ..Default::default() };
combine with struct-update syntax to override parameters, e.g.
ParquetCdcOptions { min_chunk_size: 4096, ..ParquetCdcOptions::enabled() }.
Sourcepub fn disabled() -> Self
pub fn disabled() -> Self
Returns disabled CDC options (equivalent to ParquetCdcOptions::default).
Trait Implementations§
Source§impl Clone for ParquetCdcOptions
impl Clone for ParquetCdcOptions
Source§fn clone(&self) -> ParquetCdcOptions
fn clone(&self) -> ParquetCdcOptions
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl ConfigField for ParquetCdcOptions
impl ConfigField for ParquetCdcOptions
Source§impl Debug for ParquetCdcOptions
impl Debug for ParquetCdcOptions
Source§impl Default for ParquetCdcOptions
impl Default for ParquetCdcOptions
Source§impl From<&ParquetCdcOptions> for Option<CdcOptions>
Available on crate feature parquet only.Convert DataFusion’s ParquetCdcOptions into parquet-rs’s Option<CdcOptions>.
impl From<&ParquetCdcOptions> for Option<CdcOptions>
parquet only.Convert DataFusion’s ParquetCdcOptions into parquet-rs’s Option<CdcOptions>.
parquet-rs has no enabled flag; CDC is on when the option is Some. So a
disabled ParquetCdcOptions maps to None, and an enabled one to Some
with the chunking parameters.
Source§fn from(value: &ParquetCdcOptions) -> Self
fn from(value: &ParquetCdcOptions) -> Self
Source§impl From<Option<&CdcOptions>> for ParquetCdcOptions
Available on crate feature parquet only.Convert parquet-rs’s Option<&CdcOptions> back into DataFusion’s
ParquetCdcOptions.
impl From<Option<&CdcOptions>> for ParquetCdcOptions
parquet only.Convert parquet-rs’s Option<&CdcOptions> back into DataFusion’s
ParquetCdcOptions.
The presence of parquet-rs options means CDC was enabled, so Some maps to
enabled: true; None yields the disabled default.
Source§fn from(value: Option<&CdcOptions>) -> Self
fn from(value: Option<&CdcOptions>) -> Self
Source§impl PartialEq for ParquetCdcOptions
impl PartialEq for ParquetCdcOptions
Source§fn eq(&self, other: &ParquetCdcOptions) -> bool
fn eq(&self, other: &ParquetCdcOptions) -> bool
self and other values to be equal, and is used by ==.