anno/backends/graph_coref/mod.rs
1//! Graph-Based Coreference Resolution with Iterative Refinement.
2//!
3//! This module implements a graph-based approach to coreference resolution,
4//! inspired by the Graph-to-Graph Transformer (G2GT) architecture from
5//! Miculicich & Henderson (2022). The key insight: model coreference as a
6//! graph where nodes are mentions and edges are coref links, then iteratively
7//! refine predictions until convergence.
8//!
9//! # Historical Context
10//!
11//! Coreference resolution evolved through distinct paradigms:
12//!
13//! ```text
14//! 1995-2010 Rule-based: Hobbs algorithm, centering theory
15//! 2010-2016 Mention-pair: Classify (m_i, m_j) independently
16//! 2017 Lee et al.: End-to-end span-based, O(N⁴) complexity
17//! 2018 Lee et al.: Higher-order with representation refinement
18//! 2022 G2GT: Graph refinement with global decisions, O(N² × T)
19//! ```
20//!
21//! **The core problem with pairwise models**: Decisions are independent.
22//! If P(A~B)=0.9 and P(B~C)=0.9, transitivity implies A~C, but pairwise
23//! models can output P(A~C)=0.1. The G2GT approach addresses this by
24//! conditioning each iteration on the full predicted graph from the previous
25//! iteration, enabling global consistency.
26//!
27//! # Architecture
28//!
29//! ```text
30//! Input: Detected mentions M = [m₁, m₂, ..., mₙ]
31//! ↓
32//! ┌─────────────────────────────────────────────────────────┐
33//! │ Iteration 0: Initialize empty graph G₀ │
34//! │ - Nodes: all mentions │
35//! │ - Edges: none │
36//! └─────────────────────────────────────────────────────────┘
37//! ↓
38//! ┌─────────────────────────────────────────────────────────┐
39//! │ Iteration t: Refine graph Gₜ₋₁ → Gₜ │
40//! │ For each mention pair (mᵢ, mⱼ) where j < i: │
41//! │ 1. Compute pairwise score s(mᵢ, mⱼ) │
42//! │ 2. Add graph context from Gₜ₋₁ (transitivity bonus) │
43//! │ 3. Update edge if score exceeds threshold │
44//! └─────────────────────────────────────────────────────────┘
45//! ↓ (repeat until Gₜ = Gₜ₋₁ or t = max_iterations)
46//! ┌─────────────────────────────────────────────────────────┐
47//! │ Extract clusters via connected components │
48//! └─────────────────────────────────────────────────────────┘
49//! ↓
50//! Output: Coreference chains (clusters)
51//! ```
52//!
53//! # Approximation vs. Full G2GT
54//!
55//! This implementation is a **heuristic approximation**, not a full neural model.
56//!
57//! | Aspect | G2GT (Miculicich 2022) | This Implementation |
58//! |--------|------------------------|---------------------|
59//! | **Graph nodes** | Tokens | Mentions (pre-detected) |
60//! | **Graph encoding** | Attention modification: `Lk = E(G)·Wk` | Explicit transitivity bonus |
61//! | **Pairwise scoring** | Learned neural scorer | String/head heuristics |
62//! | **Refinement** | Full neural re-prediction | Score adjustment |
63//! | **Training** | End-to-end backprop | None (heuristic) |
64//!
65//! ## What We Preserve
66//!
67//! The key insight from G2GT: **iterative refinement with graph conditioning**
68//! enables global consistency that independent pairwise models lack. Even with
69//! heuristic scoring, the refinement loop propagates transitivity constraints.
70//!
71//! ## What We Lose
72//!
73//! - **Learned representations**: G2GT embeds graph structure directly into
74//! transformer attention. We approximate this with explicit bonuses.
75//! - **End-to-end optimization**: G2GT trains the full system. We use fixed heuristics.
76//! - **Token-level granularity**: G2GT operates on tokens; we operate on mentions.
77//!
78//! For production use with high accuracy requirements, consider a full neural
79//! implementation or the T5-based coreference in `crate::backends::coref_t5`.
80//!
81//! # Usage with MentionType
82//!
83//! For best results, provide mentions with `mention_type` set:
84//!
85//! ```rust,ignore
86//! use anno::backends::graph_coref::GraphCoref;
87//! use anno::eval::coref::{Mention, MentionType};
88//!
89//! // Properly annotated mentions work better
90//! let mut john = Mention::new("John", 0, 4);
91//! john.mention_type = Some(MentionType::Proper);
92//!
93//! let mut he = Mention::new("he", 20, 22);
94//! he.mention_type = Some(MentionType::Pronominal);
95//!
96//! let coref = GraphCoref::new();
97//! let chains = coref.resolve(&[john, he]);
98//! ```
99//!
100//! # Graph Initialization: Syntactic vs Semantic
101//!
102//! SpanEIT (Hossain et al. 2025) constructs a combined graph:
103//! - **Syntactic edges** (`E_syn`): From dependency parse (adjectival modifiers, etc.)
104//! - **Semantic edges** (`E_sem`): From co-occurrence statistics
105//!
106//! Use [`CorefGraph::seed_cooccurrence_edges`] to initialize with proximity-based
107//! priors before running iterative refinement.
108//!
109//! # References
110//!
111//! - Miculicich & Henderson (2022): "Graph Refinement for Coreference Resolution"
112//! [arXiv:2203.16574](https://arxiv.org/abs/2203.16574)
113//! - Lee et al. (2017): "End-to-end Neural Coreference Resolution"
114//! - Lee et al. (2018): "Higher-Order Coreference Resolution"
115//! - Mohammadshahi & Henderson (2021): "Graph-to-Graph Transformer for Dependency Parsing"
116//! - Hossain et al. (2025): "SpanEIT: Dynamic Span Interaction and Graph-Aware Memory"
117//! [arXiv:2509.11604](https://arxiv.org/abs/2509.11604)
118//!
119//! # Future Direction: Sheaf Neural Networks
120//!
121//! This implementation uses explicit transitivity bonuses to approximate global consistency.
122//! A more principled approach: **Sheaf Neural Networks** replace scalar edge weights with
123//! learned linear maps (restriction maps) and minimize the sheaf Dirichlet energy:
124//!
125//! ```text
126//! E(x) = Σ_{(u,v) ∈ E} || F(u→v) · x_u - F(v→u) · x_v ||²
127//! ```
128//!
129//! This enforces transitivity at the gradient level, not post-hoc. See:
130//! - `archive/geometric-2024-12/sheaf.rs` for stub implementation and trait definitions
131//! - Bodnar et al. (2023): "Neural Sheaf Diffusion" - NeurIPS
132//! - twitter-research/neural-sheaf-diffusion (Apache 2.0): reference implementation
133
134use anno_core::{CorefChain, Mention, MentionType};
135use std::collections::{HashMap, HashSet, VecDeque};
136use std::hash::{Hash, Hasher};
137
138// =============================================================================
139// Types
140// =============================================================================
141
142/// Edge type in the coreference graph.
143///
144/// Following G2GT's three-way classification:
145/// - 0 (None): No relationship
146/// - 1 (Mention): Within-mention link (not used here since we operate on mentions, not tokens)
147/// - 2 (Coref): Coreference link between mentions
148#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
149#[repr(u8)]
150pub enum EdgeType {
151 /// No link between mentions.
152 None = 0,
153 /// Coreference link (both mentions refer to the same entity).
154 Coref = 2,
155}
156
157/// A coreference graph representing mention relationships.
158///
159/// This is the core data structure for iterative refinement. Nodes are mention
160/// indices, edges are coreference links. The graph is stored as an adjacency
161/// set for O(1) edge lookup during refinement.
162///
163/// # Invariants
164///
165/// - Graph is symmetric: if edge(i,j) exists, edge(j,i) exists
166/// - No self-loops: edge(i,i) is always None
167/// - Indices are valid mention indices: 0 <= i,j < num_mentions
168///
169/// # Example
170///
171/// ```rust
172/// use anno::backends::graph_coref::CorefGraph;
173///
174/// let mut graph = CorefGraph::new(3);
175/// graph.add_edge(0, 1);
176/// graph.add_edge(1, 2);
177///
178/// assert!(graph.has_edge(0, 1));
179/// assert!(graph.transitively_connected(0, 2)); // via 0-1-2
180///
181/// let clusters = graph.extract_clusters();
182/// assert_eq!(clusters.len(), 1); // All connected
183/// ```
184pub mod types;
185pub use types::*;
186
187/// Graph-based coreference resolver.
188pub struct GraphCoref {
189 config: GraphCorefConfig,
190}
191
192impl GraphCoref {
193 /// Create a new graph coref resolver with default configuration.
194 #[must_use]
195 pub fn new() -> Self {
196 Self::with_config(GraphCorefConfig::default())
197 }
198
199 /// Create with custom configuration.
200 #[must_use]
201 pub fn with_config(config: GraphCorefConfig) -> Self {
202 Self { config }
203 }
204
205 fn graph_fingerprint(graph: &CorefGraph) -> u64 {
206 // Order-independent fingerprint of the edge set.
207 // We sort per-edge hashes to avoid HashSet iteration nondeterminism.
208 let mut edge_hashes: Vec<u64> = graph
209 .edges
210 .iter()
211 .map(|e| {
212 let mut h = std::collections::hash_map::DefaultHasher::new();
213 e.hash(&mut h);
214 h.finish()
215 })
216 .collect();
217 edge_hashes.sort_unstable();
218
219 let mut h = std::collections::hash_map::DefaultHasher::new();
220 graph.num_mentions.hash(&mut h);
221 for eh in edge_hashes {
222 eh.hash(&mut h);
223 }
224 h.finish()
225 }
226
227 /// Resolve coreferences among mentions using iterative graph refinement.
228 ///
229 /// # Arguments
230 ///
231 /// * `mentions` - Pre-detected mentions (from NER or mention detector).
232 /// For best results, set `mention_type` on each mention.
233 ///
234 /// # Returns
235 ///
236 /// Coreference chains (clusters) where each chain contains mentions
237 /// referring to the same entity. By default, singletons are filtered;
238 /// set `config.include_singletons = true` to include them.
239 ///
240 /// # Panics
241 ///
242 /// Does not panic. Empty input returns empty output.
243 ///
244 /// # Example
245 ///
246 /// ```rust
247 /// use anno::backends::graph_coref::GraphCoref;
248 /// use anno::{Mention, MentionType};
249 ///
250 /// let coref = GraphCoref::new();
251 ///
252 /// let mut john = Mention::new("John", 0, 4);
253 /// john.mention_type = Some(MentionType::Proper);
254 ///
255 /// let mut he = Mention::new("he", 20, 22);
256 /// he.mention_type = Some(MentionType::Pronominal);
257 ///
258 /// let chains = coref.resolve(&[john, he]);
259 /// ```
260 #[must_use]
261 pub fn resolve(&self, mentions: &[Mention]) -> Vec<CorefChain> {
262 if mentions.is_empty() {
263 return vec![];
264 }
265
266 // Validate mentions (filter empty/invalid)
267 let valid_mentions: Vec<&Mention> = mentions
268 .iter()
269 .filter(|m| !m.text.trim().is_empty() && m.start < m.end)
270 .collect();
271
272 if valid_mentions.is_empty() {
273 return vec![];
274 }
275
276 // Initialize empty graph
277 let mut graph = CorefGraph::new(valid_mentions.len());
278
279 // Iterative refinement
280 let mut stagnation: usize = 0;
281 let mut last_edge_count = graph.edge_count();
282 let mut seen: HashMap<u64, usize> = HashMap::new();
283 let mut history: VecDeque<u64> = VecDeque::new();
284 if let Some(cfg) = &self.config.early_stop {
285 if cfg.detect_cycles {
286 let fp0 = Self::graph_fingerprint(&graph);
287 seen.insert(fp0, 0);
288 history.push_back(fp0);
289 }
290 }
291
292 for iteration in 0..self.config.max_iterations {
293 let prev_graph = graph.clone();
294 graph = self.refine_iteration(&valid_mentions, &graph);
295
296 // Check convergence
297 if graph == prev_graph {
298 break;
299 }
300
301 // Optional early stop: stagnation / cycles.
302 if let Some(cfg) = &self.config.early_stop {
303 // Stagnation: edge count stops changing.
304 let ec = graph.edge_count();
305 if ec == last_edge_count {
306 stagnation += 1;
307 } else {
308 stagnation = 0;
309 last_edge_count = ec;
310 }
311 if cfg.stagnation_patience > 0 && stagnation >= cfg.stagnation_patience {
312 break;
313 }
314
315 if cfg.detect_cycles {
316 let fp = Self::graph_fingerprint(&graph);
317 if seen.contains_key(&fp) {
318 break;
319 }
320 seen.insert(fp, iteration + 1);
321 history.push_back(fp);
322 if cfg.cycle_history > 0 {
323 while history.len() > cfg.cycle_history {
324 if let Some(old) = history.pop_front() {
325 seen.remove(&old);
326 }
327 }
328 }
329 }
330 }
331 }
332
333 // Extract clusters and convert to CorefChains
334 self.graph_to_chains(&graph, &valid_mentions)
335 }
336
337 /// Perform one iteration of graph refinement.
338 ///
339 /// For each mention pair, computes a score incorporating:
340 /// 1. Base pairwise similarity (string match, head match, type compatibility)
341 /// 2. Graph context (transitivity bonus from shared neighbors)
342 ///
343 /// Edges are added if score exceeds threshold.
344 fn refine_iteration(&self, mentions: &[&Mention], prev_graph: &CorefGraph) -> CorefGraph {
345 let mut new_graph = CorefGraph::new(mentions.len());
346
347 for i in 0..mentions.len() {
348 for j in 0..i {
349 // Distance filter
350 if let Some(max_dist) = self.config.max_distance {
351 let dist = mentions[i].start.saturating_sub(mentions[j].end);
352 if dist > max_dist {
353 continue;
354 }
355 }
356
357 // Compute score with graph context
358 let base_score = self.pairwise_score(mentions[i], mentions[j]);
359 let context_bonus = self.graph_context_bonus(i, j, prev_graph);
360 let total_score = base_score + context_bonus;
361
362 if total_score > self.config.link_threshold {
363 new_graph.add_edge(i, j);
364 }
365 }
366 }
367
368 new_graph
369 }
370
371 /// Compute base pairwise similarity score between two mentions.
372 ///
373 /// Uses multiple signals:
374 /// - Exact string match (highest weight)
375 /// - Substring containment
376 /// - Head word match (uses `head_start`/`head_end` if available)
377 /// - Pronoun-to-proper linking (uses `mention_type` if available)
378 /// - Distance penalty
379 fn pairwise_score(&self, m1: &Mention, m2: &Mention) -> f64 {
380 let mut score = 0.0;
381
382 let t1 = m1.text.to_lowercase();
383 let t2 = m2.text.to_lowercase();
384
385 // Exact match (strongest signal)
386 if t1 == t2 {
387 score += self.config.string_similarity_weight * 1.0;
388 }
389 // Substring containment
390 else if t1.contains(&t2) || t2.contains(&t1) {
391 score += self.config.string_similarity_weight * 0.6;
392 }
393 // Head word match
394 else {
395 let h1 = self.get_head_text(m1);
396 let h2 = self.get_head_text(m2);
397 if !h1.is_empty() && h1.to_lowercase() == h2.to_lowercase() {
398 score += self.config.head_match_weight;
399 }
400 }
401
402 // Mention type compatibility
403 score += self.type_compatibility_score(m1, m2);
404
405 // Distance penalty (log scale)
406 let distance = m1.start.abs_diff(m2.end).min(m2.start.abs_diff(m1.end));
407 if distance > 0 {
408 score -= self.config.distance_weight * (distance as f64).ln();
409 }
410
411 score
412 }
413
414 /// Get head text for a mention.
415 ///
416 /// Uses `head_start`/`head_end` if available, otherwise falls back to
417 /// last word heuristic (common in English NPs where head is rightmost).
418 fn get_head_text<'a>(&self, mention: &'a Mention) -> &'a str {
419 // Use explicit head span if available
420 if let (Some(head_start), Some(head_end)) = (mention.head_start, mention.head_end) {
421 // Head offsets are relative to document, need to extract from text
422 // This is complex; fall back to heuristic for now
423 // In a full implementation, we'd have the document text available
424 let _ = (head_start, head_end);
425 }
426
427 // Fallback: last word (head-final assumption for English NPs)
428 mention.text.split_whitespace().last().unwrap_or("")
429 }
430
431 /// Compute type compatibility score between two mentions.
432 ///
433 /// Uses `MentionType` field if available, otherwise uses heuristics.
434 fn type_compatibility_score(&self, m1: &Mention, m2: &Mention) -> f64 {
435 let type1 = m1
436 .mention_type
437 .unwrap_or_else(|| self.infer_mention_type(m1));
438 let type2 = m2
439 .mention_type
440 .unwrap_or_else(|| self.infer_mention_type(m2));
441
442 match (type1, type2) {
443 // Pronoun linking to proper noun: boost
444 (MentionType::Pronominal, MentionType::Proper)
445 | (MentionType::Proper, MentionType::Pronominal) => self.config.pronoun_proper_bonus,
446
447 // Pronoun linking to nominal: smaller boost
448 (MentionType::Pronominal, MentionType::Nominal)
449 | (MentionType::Nominal, MentionType::Pronominal) => {
450 self.config.pronoun_proper_bonus * 0.5
451 }
452
453 // Same type: neutral
454 _ if type1 == type2 => 0.0,
455
456 // Different non-pronoun types: slight penalty
457 _ => -0.1,
458 }
459 }
460
461 /// Infer mention type from text when not explicitly set.
462 ///
463 /// This is a fallback heuristic. For best results, set `mention_type`
464 /// on mentions before calling `resolve()`.
465 fn infer_mention_type(&self, mention: &Mention) -> MentionType {
466 let text_lower = mention.text.to_lowercase();
467
468 // Check for pronouns
469 const PRONOUNS: &[&str] = &[
470 "i",
471 "me",
472 "my",
473 "mine",
474 "myself",
475 "you",
476 "your",
477 "yours",
478 "yourself",
479 "yourselves",
480 "he",
481 "him",
482 "his",
483 "himself",
484 "she",
485 "her",
486 "hers",
487 "herself",
488 "it",
489 "its",
490 "itself",
491 "we",
492 "us",
493 "our",
494 "ours",
495 "ourselves",
496 "they",
497 "them",
498 "their",
499 "theirs",
500 "themselves",
501 "who",
502 "whom",
503 "whose",
504 "which",
505 "that",
506 "this",
507 "these",
508 "those",
509 ];
510
511 if PRONOUNS.contains(&text_lower.as_str()) {
512 return MentionType::Pronominal;
513 }
514
515 // Check for proper noun (starts with uppercase, not sentence-initial heuristic)
516 let first_char = mention.text.chars().next();
517 if first_char.is_some_and(|c| c.is_uppercase()) {
518 // Additional check: not a common word
519 let common_words = ["the", "a", "an", "this", "that", "these", "those"];
520 if !common_words.contains(&text_lower.as_str()) {
521 return MentionType::Proper;
522 }
523 }
524
525 // Default to nominal
526 MentionType::Nominal
527 }
528
529 /// Compute graph context bonus based on previous iteration's structure.
530 ///
531 /// This is our approximation of G2GT's graph-conditioned attention.
532 ///
533 /// # Transitivity Bonus
534 ///
535 /// If A~C and B~C in the previous graph, A and B should likely be linked.
536 /// We add a bonus proportional to the number of shared neighbors.
537 ///
538 /// # Already Connected Bonus
539 ///
540 /// If A and B are already transitively connected (through a chain of
541 /// coreference links), add a bonus to preserve and strengthen the connection.
542 fn graph_context_bonus(&self, i: usize, j: usize, prev_graph: &CorefGraph) -> f64 {
543 let mut bonus = 0.0;
544
545 // Bonus for shared neighbors (transitivity signal)
546 let shared = prev_graph.shared_neighbors(i, j);
547 bonus += (shared as f64) * self.config.per_shared_neighbor_bonus;
548
549 // Bonus if already transitively connected
550 if prev_graph.transitively_connected(i, j) {
551 bonus += self.config.transitivity_bonus;
552 }
553
554 bonus
555 }
556
557 /// Convert graph clusters to CorefChain format.
558 fn graph_to_chains(&self, graph: &CorefGraph, mentions: &[&Mention]) -> Vec<CorefChain> {
559 let clusters = graph.extract_clusters();
560
561 clusters
562 .into_iter()
563 .filter(|cluster| self.config.include_singletons || cluster.len() > 1)
564 .enumerate()
565 .map(|(id, indices)| {
566 let chain_mentions: Vec<Mention> = indices
567 .into_iter()
568 .map(|i| (*mentions[i]).clone())
569 .collect();
570
571 let mut chain = CorefChain::new(chain_mentions);
572 chain.cluster_id = Some((id as u64).into());
573
574 // Set entity type from first proper mention
575 chain.entity_type = chain
576 .mentions
577 .iter()
578 .find(|m| m.mention_type == Some(MentionType::Proper))
579 .and_then(|m| m.entity_type.clone());
580
581 chain
582 })
583 .collect()
584 }
585
586 /// Get configuration.
587 #[must_use]
588 pub fn config(&self) -> &GraphCorefConfig {
589 &self.config
590 }
591}
592
593impl Default for GraphCoref {
594 fn default() -> Self {
595 Self::new()
596 }
597}
598
599// =============================================================================
600// Metrics and Diagnostics
601// =============================================================================
602
603/// Statistics from a graph coref run for debugging and analysis.
604///
605/// Use [`GraphCoref::resolve_with_stats`] to get these alongside results.
606#[derive(Debug, Clone, Default)]
607pub struct GraphCorefStats {
608 /// Number of iterations until convergence (1 to max_iterations).
609 pub iterations: usize,
610 /// Number of edges in final graph.
611 pub final_edges: usize,
612 /// Number of clusters (including singletons).
613 pub num_clusters: usize,
614 /// Number of non-singleton clusters.
615 pub num_chains: usize,
616 /// Per-iteration edge counts, starting from 0.
617 pub edge_history: Vec<usize>,
618 /// Whether the algorithm converged before max_iterations.
619 pub converged: bool,
620 /// Whether we stopped early for a non-fixed-point reason (cycle/stagnation).
621 pub early_stopped: bool,
622 /// Cycle detected (graph fingerprint repeated).
623 pub cycle_detected: bool,
624 /// Stagnation detected (edge count stopped changing).
625 pub stagnation_detected: bool,
626}
627
628impl GraphCoref {
629 /// Resolve coreferences and return detailed statistics.
630 ///
631 /// Useful for debugging, tuning parameters, and understanding convergence.
632 ///
633 /// # Example
634 ///
635 /// ```rust
636 /// use anno::backends::graph_coref::GraphCoref;
637 /// use anno::Mention;
638 ///
639 /// let coref = GraphCoref::new();
640 /// let mentions = vec![
641 /// Mention::new("John", 0, 4),
642 /// Mention::new("John", 50, 54),
643 /// ];
644 ///
645 /// let (chains, stats) = coref.resolve_with_stats(&mentions);
646 /// println!("Converged in {} iterations", stats.iterations);
647 /// println!("Edge history: {:?}", stats.edge_history);
648 /// ```
649 #[must_use]
650 pub fn resolve_with_stats(&self, mentions: &[Mention]) -> (Vec<CorefChain>, GraphCorefStats) {
651 let mut stats = GraphCorefStats::default();
652
653 if mentions.is_empty() {
654 return (vec![], stats);
655 }
656
657 let valid_mentions: Vec<&Mention> = mentions
658 .iter()
659 .filter(|m| !m.text.trim().is_empty() && m.start < m.end)
660 .collect();
661
662 if valid_mentions.is_empty() {
663 return (vec![], stats);
664 }
665
666 let mut graph = CorefGraph::new(valid_mentions.len());
667 stats.edge_history.push(0);
668
669 let mut stagnation: usize = 0;
670 let mut last_edge_count = graph.edge_count();
671 let mut seen: HashMap<u64, usize> = HashMap::new();
672 let mut history: VecDeque<u64> = VecDeque::new();
673 if let Some(cfg) = &self.config.early_stop {
674 if cfg.detect_cycles {
675 let fp0 = Self::graph_fingerprint(&graph);
676 seen.insert(fp0, 0);
677 history.push_back(fp0);
678 }
679 }
680
681 for iteration in 0..self.config.max_iterations {
682 let prev_graph = graph.clone();
683 graph = self.refine_iteration(&valid_mentions, &graph);
684 stats.edge_history.push(graph.edge_count());
685 stats.iterations = iteration + 1;
686
687 if graph == prev_graph {
688 stats.converged = true;
689 break;
690 }
691
692 if let Some(cfg) = &self.config.early_stop {
693 let ec = graph.edge_count();
694 if ec == last_edge_count {
695 stagnation += 1;
696 } else {
697 stagnation = 0;
698 last_edge_count = ec;
699 }
700 if cfg.stagnation_patience > 0 && stagnation >= cfg.stagnation_patience {
701 stats.early_stopped = true;
702 stats.stagnation_detected = true;
703 break;
704 }
705
706 if cfg.detect_cycles {
707 let fp = Self::graph_fingerprint(&graph);
708 if seen.contains_key(&fp) {
709 stats.early_stopped = true;
710 stats.cycle_detected = true;
711 break;
712 }
713 seen.insert(fp, iteration + 1);
714 history.push_back(fp);
715 if cfg.cycle_history > 0 {
716 while history.len() > cfg.cycle_history {
717 if let Some(old) = history.pop_front() {
718 seen.remove(&old);
719 }
720 }
721 }
722 }
723 }
724 }
725
726 let clusters = graph.extract_clusters();
727 stats.final_edges = graph.edge_count();
728 stats.num_clusters = clusters.len();
729 stats.num_chains = clusters.iter().filter(|c| c.len() > 1).count();
730
731 let chains = self.graph_to_chains(&graph, &valid_mentions);
732 (chains, stats)
733 }
734}
735
736// =============================================================================
737// Evaluation Helpers
738// =============================================================================
739
740/// Convert GraphCoref output to format suitable for CoNLL evaluation.
741///
742/// This produces a `CorefDocument` that can be evaluated with standard
743/// coreference metrics (MUC, B³, CEAF, LEA).
744///
745/// # Example
746///
747/// ```rust
748/// use anno::backends::graph_coref::{GraphCoref, chains_to_document};
749/// use anno::Mention;
750///
751/// let coref = GraphCoref::new();
752/// let mentions = vec![
753/// Mention::new("John", 0, 4),
754/// Mention::new("he", 20, 22),
755/// ];
756///
757/// let chains = coref.resolve(&mentions);
758/// let doc = chains_to_document("John went to work. He was late.", chains);
759/// ```
760pub fn chains_to_document(
761 text: impl Into<String>,
762 chains: Vec<CorefChain>,
763) -> anno_core::CorefDocument {
764 anno_core::CorefDocument::new(text, chains)
765}
766
767// =============================================================================
768// Tests
769// =============================================================================
770
771#[cfg(test)]
772mod tests;