pub type DeduplicationConfig = ConfigValueGroup;Aliased Type§
pub struct DeduplicationConfig {
pub nranges_in_streaming_fragmentation_estimator: usize,
pub min_n_chunks_per_range_hysteresis_factor: f32,
pub min_n_chunks_per_range: f32,
pub global_dedup_query_enabled: bool,
}Fields§
§nranges_in_streaming_fragmentation_estimator: usizeNumber of ranges to use when estimating fragmentation
The default value is 128.
Use the environment variable HF_XET_DEDUPLICATION_NRANGES_IN_STREAMING_FRAGMENTATION_ESTIMATOR to set this value.
min_n_chunks_per_range_hysteresis_factor: f32Minimum number of chunks per range. Used to control fragmentation This targets an average of 1MB per range. The hysteresis factor multiplied by the target Chunks Per Range (CPR) controls the low end of the hysteresis range. Basically, dedupe will stop when CPR drops below hysteresis * target_cpr, and will start again when CPR increases above target CPR.
The default value is 0.5.
Use the environment variable HF_XET_DEDUPLICATION_MIN_N_CHUNKS_PER_RANGE_HYSTERESIS_FACTOR to set this value.
min_n_chunks_per_range: f32Minimum number of chunks per range.
The default value is 8.0.
Use the environment variable HF_XET_DEDUPLICATION_MIN_N_CHUNKS_PER_RANGE to set this value.
global_dedup_query_enabled: boolWhether to enable global deduplication queries to the server. When enabled, the system will query the server for deduplication shards based on chunk hashes to enable cross-repository deduplication.
The default value is true.
Use the environment variable HF_XET_DEDUPLICATION_GLOBAL_DEDUP_QUERY_ENABLED to set this value.