pub struct SamplerConfig {
pub seed: u64,
pub batch_size: usize,
pub ingestion_max_records: usize,
pub chunking: ChunkingStrategy,
pub recipes: Vec<TripletRecipe>,
pub text_recipes: Vec<TextRecipe>,
pub split: SplitRatios,
pub allowed_splits: Vec<SplitLabel>,
}Expand description
Top-level sampler configuration.
Fields§
§seed: u64RNG seed that controls deterministic sampling order.
batch_size: usizeTarget number of samples per batch.
ingestion_max_records: usizeMax number of records kept in the ingestion cache for candidate sampling.
This is intentionally decoupled from batch_size so anchors/negatives can
be drawn from a broader rolling pool.
Practical tuning: values above batch_size usually improve diversity and
reduce short-horizon repetition; gains taper off as source/recipe/split
constraints become the limiting factor. Higher values also increase memory.
For remote shard-backed sources (for example Hugging Face), larger initial targets may require fetching more shards before the first batch, so startup latency can increase based on shard sizes and network throughput.
chunking: ChunkingStrategyChunking behavior for long sections.
recipes: Vec<TripletRecipe>Triplet recipes to use; empty means sources may provide defaults.
text_recipes: Vec<TextRecipe>Text recipes to use; empty means derived from triplet recipes if available.
split: SplitRatiosSplit ratios used when assigning records to train/val/test.
allowed_splits: Vec<SplitLabel>Splits allowed for sampling requests.
Implementations§
Source§impl SamplerConfig
impl SamplerConfig
Sourcepub fn with_denoiser(self, config: DenoiserConfig) -> Self
pub fn with_denoiser(self, config: DenoiserConfig) -> Self
Consuming builder to enable the built-in OCR/markdown denoiser on the sampler’s chunking strategy.
Chains denoiser setup during SamplerConfig construction. Works with
struct update syntax to customize other fields at the same time:
use triplets_core::{SamplerConfig, config::DenoiserConfig};
// Enable denoiser with all other fields at their defaults:
let config = SamplerConfig::default()
.with_denoiser(DenoiserConfig { enabled: true, ..DenoiserConfig::default() });
// Or customize other fields first, then add the denoiser:
let config = SamplerConfig { batch_size: 32, ..SamplerConfig::default() }
.with_denoiser(DenoiserConfig { enabled: true, ..DenoiserConfig::default() });Trait Implementations§
Source§impl Clone for SamplerConfig
impl Clone for SamplerConfig
Source§fn clone(&self) -> SamplerConfig
fn clone(&self) -> SamplerConfig
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for SamplerConfig
impl Debug for SamplerConfig
Auto Trait Implementations§
impl Freeze for SamplerConfig
impl !RefUnwindSafe for SamplerConfig
impl Send for SamplerConfig
impl Sync for SamplerConfig
impl Unpin for SamplerConfig
impl UnsafeUnpin for SamplerConfig
impl !UnwindSafe for SamplerConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more