Skip to main content

PROACTIVE_CONFLICT_SIM_THRESHOLD

Constant PROACTIVE_CONFLICT_SIM_THRESHOLD 

Source
pub const PROACTIVE_CONFLICT_SIM_THRESHOLD: f32 = 0.95;
Expand description

Cosine-similarity threshold above which a candidate is treated as a near-duplicate for the purpose of proactive_conflict_check.

Empirically tuned for the MiniLM-L6-v2 / Nomic embedder pair: rows whose (title, content) paraphrase the query at this level are already considered “the same memory” by the existing duplicate machinery (DUPLICATE_THRESHOLD_DEFAULT sits at 0.85 for the merge-suggestion surface). 0.95 is the stricter “this is the same fact, restated” bar; combined with the textual contradiction signal below, we surface only writes that proactively conflict with an established near-duplicate.

Known miss class (pre-existing; deliberately unchanged by the #1579 A5 remediation): genuine paraphrases can embed just BELOW this bar — the P2-audit probe pair (“deadline is june 15” vs “deadline is june 22” in otherwise-identical sentences) scored 0.945 cosine on the release MiniLM and is therefore not detected. Safe direction for an advisory gate (the write is ALLOWED; nothing is wrongly refused); lowering the bar instead would re-open the false-409 epidemic the PROACTIVE_CONFLICT_CONTENT_JACCARD_FLOOR corroboration exists to close. The deeper detect_contradiction tooling remains the surface for sub-threshold contradictions.