pub struct TripletRecipe {
pub name: Cow<'static, str>,
pub anchor: Selector,
pub positive_selector: Selector,
pub negative_selector: Selector,
pub negative_strategy: NegativeStrategy,
pub weight: f32,
pub instruction: Option<Cow<'static, str>>,
pub allow_same_anchor_positive: bool,
}Expand description
Defines a triplet recipe (anchor/positive/negative selection + weighting).
§Split-isolation contract
All three chunk slots (anchor, positive, negative) must resolve to records
whose IDs hash to the same split as the request split. The sampler enforces
this automatically for Selector::Role, Selector::Paragraph, and
Selector::Random — those selectors always read from the record that was
already confirmed to be in the correct split.
Selector::TemporalOffset crosses a record boundary (it picks a different
record by proximity in time) and the split check is re-applied inside
select_temporal_neighbor. No additional care is required on your side,
but you should be aware that in pools with few same-split neighbors the
selector will return None and fall back to skipping a slot rather than
contaminating splits.
§Stable IDs
Record IDs must be stable across runs. Split assignment is derived deterministically from the record ID and the sampler seed; changing an ID changes its split assignment, which invalidates any persisted split state. IDs should also be globally unique — if two records from different sources share the same ID, only one will be kept in the sampler, and the discarded record’s split assignment silently goes with it.
Fields§
§name: Cow<'static, str>Unique name for this recipe.
anchor: SelectorSelector used for anchor chunks.
positive_selector: SelectorSelector used for positive chunks (same record).
negative_selector: SelectorSelector used for negative chunks (different record).
negative_strategy: NegativeStrategyStrategy used to pick negatives.
weight: f32Relative weight controlling how often this recipe is selected versus other recipes.
Each recipe with a positive weight receives a number of slots in the shuffled selection
order proportional to weight / min_positive_weight across all active recipes, so a
recipe with weight = 2.0 is drawn approximately twice as often as one with weight = 1.0. The weight also scales the weight field on every crate::SampleTriplet
returned by this recipe, which the caller’s training loop can use for loss weighting.
Recipes with weight <= 0.0 are excluded from selection entirely and no samples
are produced for them.
instruction: Option<Cow<'static, str>>Optional instruction text attached to samples from this recipe.
allow_same_anchor_positive: boolAllow anchor and positive to carry identical text (SimCSE / dropout-trick mode).
When true, the sampler will emit triplets even when the anchor and positive
sections resolve to the same text. This enables the unsupervised SimCSE
training pattern: the same text string feeds both slots, and the model’s
dropout layers produce two slightly different embeddings at training time.
Negatives are still required to differ from both anchor and positive.
Defaults to false; set true only for recipes whose anchor and positive
selectors intentionally resolve to the same content (e.g. text-only sources).
Trait Implementations§
Source§impl Clone for TripletRecipe
impl Clone for TripletRecipe
Source§fn clone(&self) -> TripletRecipe
fn clone(&self) -> TripletRecipe
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for TripletRecipe
impl Debug for TripletRecipe
Auto Trait Implementations§
impl Freeze for TripletRecipe
impl RefUnwindSafe for TripletRecipe
impl Send for TripletRecipe
impl Sync for TripletRecipe
impl Unpin for TripletRecipe
impl UnsafeUnpin for TripletRecipe
impl UnwindSafe for TripletRecipe
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more