1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
//! Mention-Ranking Coreference Resolution.
//!
//! A simpler alternative to E2E-Coref that uses external mention detection
//! (from NER/parser) and ranks antecedent candidates.
//!
//! # Research Foundations
//!
//! This module is primarily a practical implementation of mention-ranking coreference:
//! score candidate antecedents for each mention, then cluster by transitive closure.
//!
//! Where this module cites papers, treat those citations as *context* for ideas that are
//! instantiated here (feature hooks, configuration defaults). If a comment cannot be
//! traced to a cited source, it should be removed rather than treated as authoritative.
//!
//! # Clinical heuristics (inspired by clinical-coref literature)
//!
//! This implementation includes optional heuristics commonly discussed in clinical-coref
//! settings (acronym expansion, “be-phrase” patterns, local context filtering). When
//! enabled, they should be validated on your target dataset; defaults aim to be conservative.
//!
//! ## "Be Phrase" Detection
//!
//! Identity patterns like "Resolution of X is Y" strongly indicate coreference.
//! From the paper: "if there is a 'be phrase' between two concepts of the same
//! type, they are probably saying 'something is something'."
//!
//! Enabled via [`MentionRankingConfig::enable_be_phrase_detection`].
//!
//! ## Acronym Matching
//!
//! Medical acronyms reliably link to their expansions:
//! - "MRSA" ↔ "Methicillin-resistant Staphylococcus aureus"
//! - "CHF" ↔ "Congestive Heart Failure"
//!
//! From the paper: "The first letters of each word in concepts that have two
//! or more words are taken and compared to whole words in other concepts."
//!
//! Enabled via [`MentionRankingConfig::enable_acronym_matching`].
//!
//! ## Context-Based Link Filtering
//!
//! Different dates/locations suggest different entities. From the paper:
//! "eliminate links that actually refer to two different entities based on
//! clues found in the sentences surrounding the mentions."
//!
//! Enabled via [`MentionRankingConfig::enable_context_filtering`].
//!
//! ## Synonym matching
//!
//! This module supports synonym-aware matching via pluggable sources (see code), but
//! avoids shipping large hardcoded domain synonym tables by default.
//!
//! ## Clinical Configuration
//!
//! Use [`MentionRankingConfig::clinical()`] for clinical/biomedical text:
//!
//! ```rust
//! use anno::backends::coref::mention_ranking::{MentionRankingConfig, MentionRankingCoref};
//!
//! let config = MentionRankingConfig::clinical();
//! let coref = MentionRankingCoref::with_config(config);
//!
//! let text = "The patient is John Smith. Pt was admitted with MRSA.";
//! let clusters = coref.resolve(text).unwrap();
//! ```
//!
//! # Long-document notes
//!
//! Long-document coreference is difficult. The long-doc literature is a good source of
//! evaluation benchmarks and error modes, but this implementation does not aim to
//! reproduce specific reported numbers in its doc comments.
//!
//! # Historical Context
//!
//! Coreference resolution approaches evolved through distinct paradigms:
//!
//! ```text
//! 1995-2010 Rule-based: Hobbs algorithm, centering theory
//! 1997 Kehler: Probabilistic coref with Dempster-Shafer (IE context)
//! 2010-2016 Mention-pair: Classify (m_i, m_j) independently
//! 2013-2017 Mention-ranking: Rank antecedents for each mention
//! 2017+ E2E-Coref: Joint mention detection + clustering
//! 2022 G2GT: Graph refinement with global decisions
//! 2024 Maverick: Efficient E2E with 500M params
//! ```
//!
//! Mention-ranking sits between mention-pair (too independent) and E2E
//! (too complex). It's still valuable for:
//! - Interpretable, feature-based debugging
//! - Fast inference without GPU
//! - Scenarios with good external mention detection
//!
//! ## Configuration-level uncertainty
//!
//! Some classic probabilistic formulations treat coreference as a distribution over
//! clusterings/configurations. This implementation is greedy and does not attempt to
//! represent full configuration uncertainty.
//!
//! ## Graph refinement (separate implementation)
//!
//! If you want iterative/global graph refinement, use the dedicated graph-coref backend
//! (separate module) rather than treating this mention-ranking implementation as equivalent.
//!
//! # Architecture
//!
//! ```text
//! Input: "John saw Mary. He waved."
//! ↓
//! ┌─────────────────────────────────────────────────────────┐
//! │ 1. External Mention Detection │
//! │ Use NER/parser to find NPs, pronouns, named entities │
//! │ Mentions: [John, Mary, He] │
//! └─────────────────────────────────────────────────────────┘
//! ↓
//! ┌─────────────────────────────────────────────────────────┐
//! │ 2. Mention Representation │
//! │ Extract features for each mention: │
//! │ - Surface form, head word │
//! │ - Type (pronoun, proper, nominal) │
//! │ - Gender, number, animacy │
//! └─────────────────────────────────────────────────────────┘
//! ↓
//! ┌─────────────────────────────────────────────────────────┐
//! │ 3. Antecedent Ranking │
//! │ For each mention, rank all previous mentions │
//! │ Features: string match, distance, type compatibility │
//! │ Link to highest-scoring antecedent above threshold │
//! └─────────────────────────────────────────────────────────┘
//! ↓
//! ┌─────────────────────────────────────────────────────────┐
//! │ 4. Clustering │
//! │ Group linked mentions into clusters via transitivity │
//! └─────────────────────────────────────────────────────────┘
//! ↓
//! Output: {[John, He], [Mary]}
//! ```
//!
//! # Compared to other approaches
//!
//! Mention-ranking is typically simpler than end-to-end span models and can be faster to
//! debug and iterate on. For accuracy claims, rely on the evaluation harness and dataset
//! reports rather than prose numbers in docs.
//!
//! # References
//!
//! - NeuralCoref (HuggingFace): <https://github.com/huggingface/neuralcoref>
//! - Clark & Manning 2016: "Deep Reinforcement Learning for Mention-Ranking Coreference Models"
//! - Miculicich & Henderson 2022: "Graph Refinement for Coreference Resolution"
//! [arXiv:2203.16574](https://arxiv.org/abs/2203.16574)
//!
//! # Salience Integration
//!
//! Entity salience (importance) can inform coreference decisions:
//! - Salient entities are mentioned more often (stronger evidence)
//! - Linking to salient antecedents is more likely correct
//! - Helps break ties between equally-scored candidates
//!
//! Use `with_salience` to provide pre-computed salience scores. Two approaches:
//!
//! **Option 1: TextRank/YAKE salience** (keyword-based)
//!
//! ```rust,ignore
//! use anno::salience::{EntityRanker, TextRankSalience};
//! use anno::backends::coref::mention_ranking::MentionRankingCoref;
//!
//! let ranker = TextRankSalience::default();
//! let ranked = ranker.rank(text, &entities);
//! let salience_scores: HashMap<String, f64> = ranked.into_iter()
//! .map(|(e, score)| (e.text.to_lowercase(), score))
//! .collect();
//!
//! let coref = MentionRankingCoref::new()
//! .with_salience(salience_scores);
//! ```
//!
//! **Option 2: Chain-feature salience** (uses mention frequency, spread, type)
//!
//! ```rust,ignore
//! use anno::salience::features_to_salience_scores;
//! use anno::backends::coref::mention_ranking::MentionRankingCoref;
//!
//! let salience_scores = features_to_salience_scores(text, &entities);
//! let coref = MentionRankingCoref::new()
//! .with_salience(salience_scores);
//! ```
use crate::;
use crate::;
use ;
pub use *;
pub use MentionRankingCoref;