pub struct ConfigValueGroup {Show 14 fields
pub min_spacing_between_global_dedup_queries: usize,
pub local_cas_scheme: String,
pub max_concurrent_file_ingestion: usize,
pub max_concurrent_file_downloads: usize,
pub ingestion_block_size: ByteSize,
pub progress_update_interval: Duration,
pub progress_update_speed_sampling_window: Duration,
pub progress_update_speed_min_observations: u32,
pub session_xorb_metadata_flush_interval: Duration,
pub session_xorb_metadata_flush_max_count: usize,
pub default_cas_endpoint: String,
pub aggregate_progress: bool,
pub default_prefix: String,
pub staging_subdir: String,
}Expand description
ConfigValueGroup struct containing all configurable values
Fields§
§min_spacing_between_global_dedup_queries: usizeGives the minimum spacing in number of chunks between global dedup queries sent to the server to limit the number of simultaneous queries.
The default value is 256, which means that the server will receive a query at most for every 256 chunks or 4MB of data.
Use the environment variable HF_XET_DATA_MIN_SPACING_BETWEEN_GLOBAL_DEDUP_QUERIES to set this value.
local_cas_scheme: Stringscheme for a local filesystem based CAS server
The default value is “local://”.
Use the environment variable HF_XET_DATA_LOCAL_CAS_SCHEME to set this value.
max_concurrent_file_ingestion: usizeThe maximum number of files to ingest at once on the upload path. High performance mode (enabled via HF_XET_HIGH_PERFORMANCE or HF_XET_HP) automatically sets this to 100 via XetConfig::with_high_performance().
The default value is 8.
Use the environment variable HF_XET_DATA_MAX_CONCURRENT_FILE_INGESTION to set this value.
max_concurrent_file_downloads: usizeThe maximum number of files to ingest at once on the download path.
The default value is 8.
Use the environment variable HF_XET_DATA_MAX_CONCURRENT_FILE_DOWNLOADS to set this value.
ingestion_block_size: ByteSizeThe maximum block size from a file to process at once.
The default value is 8mb.
Use the environment variable HF_XET_DATA_INGESTION_BLOCK_SIZE to set this value.
progress_update_interval: DurationHow often to send updates on file progress, in milliseconds. Disables batching if set to 0.
The default value is 200ms.
Use the environment variable HF_XET_DATA_PROGRESS_UPDATE_INTERVAL to set this value.
progress_update_speed_sampling_window: DurationHalf-life duration for the exponentially weighted moving average used to estimate progress completion speed. Older rate observations are exponentially decayed with this half-life.
The default value is 10sec.
Use the environment variable HF_XET_DATA_PROGRESS_UPDATE_SPEED_SAMPLING_WINDOW to set this value.
progress_update_speed_min_observations: u32Minimum number of speed observations before reporting a rate. Until this many updates have been recorded, the completion rate is reported as unknown (None). This avoids displaying noisy initial estimates.
The default value is 4.
Use the environment variable HF_XET_DATA_PROGRESS_UPDATE_SPEED_MIN_OBSERVATIONS to set this value.
session_xorb_metadata_flush_interval: DurationHow often do we flush new xorb data to disk on a long running upload session?
The default value is 20sec.
Use the environment variable HF_XET_DATA_SESSION_XORB_METADATA_FLUSH_INTERVAL to set this value.
session_xorb_metadata_flush_max_count: usizeForce a flush of the xorb metadata every this many xorbs, if more are created in this time window.
The default value is 64.
Use the environment variable HF_XET_DATA_SESSION_XORB_METADATA_FLUSH_MAX_COUNT to set this value.
default_cas_endpoint: StringDefault CAS endpoint
The default value is “http://localhost:8080”.
Use the environment variable HF_XET_DATA_DEFAULT_CAS_ENDPOINT to set this value.
aggregate_progress: boolWhether to aggregate progress updates before sending them. When enabled, progress updates are batched and sent at regular intervals to reduce overhead.
The default value is true.
Use the environment variable HF_XET_DATA_AGGREGATE_PROGRESS to set this value.
default_prefix: StringDefault prefix used for CAS and shard operations.
The default value is “default”.
Use the environment variable HF_XET_DATA_DEFAULT_PREFIX to set this value.
staging_subdir: StringSubdirectory name for staging data within the endpoint cache directory.
The default value is “staging”.
Use the environment variable HF_XET_DATA_STAGING_SUBDIR to set this value.
Implementations§
Source§impl ConfigValueGroup
impl ConfigValueGroup
Sourcepub fn new() -> Self
pub fn new() -> Self
Create a new instance with default values only (no environment variable overrides).
Sourcepub fn apply_env_overrides(&mut self)
pub fn apply_env_overrides(&mut self)
Apply environment variable overrides to this configuration group.
The group name is derived from the module path. For example, in module xet_config::groups::data,
the env var for TEST_INT would be HF_XET_DATA_TEST_INT.
Sourcepub fn field_names() -> &'static [&'static str]
pub fn field_names() -> &'static [&'static str]
Returns the list of field names in this configuration group.
Trait Implementations§
Source§impl AsRef<ConfigValueGroup> for ConfigValueGroup
impl AsRef<ConfigValueGroup> for ConfigValueGroup
Source§fn as_ref(&self) -> &ConfigValueGroup
fn as_ref(&self) -> &ConfigValueGroup
Source§impl Clone for ConfigValueGroup
impl Clone for ConfigValueGroup
Source§fn clone(&self) -> ConfigValueGroup
fn clone(&self) -> ConfigValueGroup
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more