pub const PROACTIVE_CONFLICT_SIM_THRESHOLD: f32 = 0.95;Expand description
Cosine-similarity threshold above which a candidate is treated as a
near-duplicate for the purpose of proactive_conflict_check.
Empirically tuned for the MiniLM-L6-v2 / Nomic embedder pair: rows
whose (title, content) paraphrase the query at this level are
already considered “the same memory” by the existing duplicate
machinery (DUPLICATE_THRESHOLD_DEFAULT sits at 0.85 for the
merge-suggestion surface). 0.95 is the stricter “this is the same
fact, restated” bar; combined with the textual contradiction signal
below, we surface only writes that proactively conflict with an
established near-duplicate.
Known miss class (pre-existing; deliberately unchanged by the
#1579 A5 remediation): genuine paraphrases can embed just BELOW
this bar — the P2-audit probe pair (“deadline is june 15” vs
“deadline is june 22” in otherwise-identical sentences) scored
0.945 cosine on the release MiniLM and is therefore not detected.
Safe direction for an advisory gate (the write is ALLOWED; nothing
is wrongly refused); lowering the bar instead would re-open the
false-409 epidemic the
PROACTIVE_CONFLICT_CONTENT_JACCARD_FLOOR corroboration exists
to close. The deeper detect_contradiction tooling remains the
surface for sub-threshold contradictions.