Skip to main content

ParquetCdcOptions

Struct ParquetCdcOptions 

Source
pub struct ParquetCdcOptions {
    pub enabled: bool,
    pub min_chunk_size: usize,
    pub max_chunk_size: usize,
    pub norm_level: i32,
}
Expand description

Options for content-defined chunking (CDC) when writing parquet files. Mirrors parquet::file::properties::CdcOptions.

Carried as a ParquetCdcOptions in ParquetOptions::content_defined_chunking with an explicit enabled flag, so it can be toggled with dotted config keys (content_defined_chunking.enabled = true|false) and the result is independent of the order in which the keys are set.

Fields§

§enabled: bool

(writing) EXPERIMENTAL: Enable content-defined chunking (CDC) when writing parquet files. When enabled, parallel writing is automatically disabled since the chunker state must persist across row groups.

§min_chunk_size: usize

Minimum chunk size in bytes. The rolling hash will not trigger a split until this many bytes have been accumulated. Default is 256 KiB.

§max_chunk_size: usize

Maximum chunk size in bytes. A split is forced when the accumulated size exceeds this value. Default is 1 MiB.

§norm_level: i32

Normalization level. Increasing this improves deduplication ratio but increases fragmentation. Recommended range is [-3, 3], default is 0.

Implementations§

Source§

impl ParquetCdcOptions

Source

pub fn enabled() -> Self

Returns enabled CDC options with the default chunking parameters.

Shorthand for ParquetCdcOptions { enabled: true, ..Default::default() }; combine with struct-update syntax to override parameters, e.g. ParquetCdcOptions { min_chunk_size: 4096, ..ParquetCdcOptions::enabled() }.

Source

pub fn disabled() -> Self

Returns disabled CDC options (equivalent to ParquetCdcOptions::default).

Trait Implementations§

Source§

impl Clone for ParquetCdcOptions

Source§

fn clone(&self) -> ParquetCdcOptions

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl ConfigField for ParquetCdcOptions

Source§

fn set(&mut self, key: &str, value: &str) -> Result<()>

Source§

fn visit<V: Visit>( &self, v: &mut V, key_prefix: &str, _description: &'static str, )

Source§

fn reset(&mut self, key: &str) -> Result<()>

Source§

impl Debug for ParquetCdcOptions

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ParquetCdcOptions

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl From<&ParquetCdcOptions> for Option<CdcOptions>

Available on crate feature parquet only.

Convert DataFusion’s ParquetCdcOptions into parquet-rs’s Option<CdcOptions>.

parquet-rs has no enabled flag; CDC is on when the option is Some. So a disabled ParquetCdcOptions maps to None, and an enabled one to Some with the chunking parameters.

Source§

fn from(value: &ParquetCdcOptions) -> Self

Converts to this type from the input type.
Source§

impl From<Option<&CdcOptions>> for ParquetCdcOptions

Available on crate feature parquet only.

Convert parquet-rs’s Option<&CdcOptions> back into DataFusion’s ParquetCdcOptions.

The presence of parquet-rs options means CDC was enabled, so Some maps to enabled: true; None yields the disabled default.

Source§

fn from(value: Option<&CdcOptions>) -> Self

Converts to this type from the input type.
Source§

impl PartialEq for ParquetCdcOptions

Source§

fn eq(&self, other: &ParquetCdcOptions) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for ParquetCdcOptions

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.