pub struct GroundedDocument {
pub id: String,
pub text: String,
pub signals: Vec<Signal<Location>>,
pub tracks: HashMap<TrackId, Track>,
pub identities: HashMap<IdentityId, Identity>,
/* private fields */
}Expand description
A document with grounded entity annotations using the three-level hierarchy.
§Entity-Centric Design
Traditional document representations store entities as a flat list. This design uses an entity-centric representation where:
- Signals are the atomic detections (Level 1)
- Tracks cluster signals into within-document entities (Level 2)
- Identities link tracks to global KB entities (Level 3)
This enables efficient:
- Streaming signal processing (add signals incrementally)
- Incremental coreference (cluster signals as they arrive)
- Lazy entity linking (resolve identities only when needed)
§Usage
use anno_core::grounded::{GroundedDocument, Signal, Track, Identity, Location};
let mut doc = GroundedDocument::new("doc1", "Marie Curie won the Nobel Prize. She was a physicist.");
// Add signals (Level 1)
doc.add_signal(Signal::new(0, Location::text(0, 12), "Marie Curie", "Person", 0.95));
doc.add_signal(Signal::new(1, Location::text(38, 41), "She", "Person", 0.88));
// Form track (Level 2)
let mut track = Track::new(0, "Marie Curie");
track.add_signal(0, 0);
track.add_signal(1, 1);
doc.add_track(track);
// Link identity (Level 3)
let identity = Identity::from_kb(0, "Marie Curie", "wikidata", "Q7186");
doc.add_identity(identity);
doc.link_track_to_identity(0, 0);§Invariants
GroundedDocument maintains internal indices (signal_to_track, track_to_identity)
that must be consistent with the public collections. The following invariants hold:
- Signal ID uniqueness: All signals in
signalshave distinctidvalues. - Track signal references: Every
SignalRefin aTrack.signalspoints to a valid signal ID insignals. - Signal-to-track consistency: If
signal_to_track[s] == t, then the tracktcontains aSignalRefpointing tos. - Track-to-identity consistency: If
track_to_identity[t] == i, thentracks[t].identity_id == Some(i)andidentitiescontainsi. - Signal offsets validity: Signal text locations should match
self.text.
Prefer mutation via provided methods (add_signal, add_track, add_signal_to_track,
link_track_to_identity) rather than direct field manipulation to preserve invariants.
Use validate_invariants() to check structural consistency
after external modifications.
Fields§
§id: StringDocument identifier
text: StringRaw text content
signals: Vec<Signal<Location>>Level 1: Raw signals (detections)
tracks: HashMap<TrackId, Track>Level 2: Tracks (within-document coreference chains)
identities: HashMap<IdentityId, Identity>Level 3: Global identities (KB-linked entities)
Implementations§
Source§impl GroundedDocument
impl GroundedDocument
Sourcepub fn new(id: impl Into<String>, text: impl Into<String>) -> Self
pub fn new(id: impl Into<String>, text: impl Into<String>) -> Self
Create a new grounded document.
Sourcepub fn add_signal(&mut self, signal: Signal<Location>) -> SignalId
pub fn add_signal(&mut self, signal: Signal<Location>) -> SignalId
Add a signal and return its ID.
Sourcepub fn get_signal(&self, id: impl Into<SignalId>) -> Option<&Signal<Location>>
pub fn get_signal(&self, id: impl Into<SignalId>) -> Option<&Signal<Location>>
Get a signal by ID.
Sourcepub fn get_track_mut(&mut self, id: impl Into<TrackId>) -> Option<&mut Track>
pub fn get_track_mut(&mut self, id: impl Into<TrackId>) -> Option<&mut Track>
Get a mutable reference to a track by ID.
Sourcepub fn add_signal_to_track(
&mut self,
signal_id: impl Into<SignalId>,
track_id: impl Into<TrackId>,
position: u32,
) -> bool
pub fn add_signal_to_track( &mut self, signal_id: impl Into<SignalId>, track_id: impl Into<TrackId>, position: u32, ) -> bool
Add a signal to an existing track.
This properly updates the signal_to_track index. Returns true if the signal was added, false if track doesn’t exist.
Sourcepub fn track_for_signal(&self, signal_id: SignalId) -> Option<&Track>
pub fn track_for_signal(&self, signal_id: SignalId) -> Option<&Track>
Get the track containing a signal.
Sourcepub fn add_identity(&mut self, identity: Identity) -> IdentityId
pub fn add_identity(&mut self, identity: Identity) -> IdentityId
Add an identity and return its ID.
Sourcepub fn link_track_to_identity(
&mut self,
track_id: impl Into<TrackId>,
identity_id: impl Into<IdentityId>,
)
pub fn link_track_to_identity( &mut self, track_id: impl Into<TrackId>, identity_id: impl Into<IdentityId>, )
Link a track to an identity.
Sourcepub fn get_identity(&self, id: IdentityId) -> Option<&Identity>
pub fn get_identity(&self, id: IdentityId) -> Option<&Identity>
Get an identity by ID.
Sourcepub fn identity_for_track(&self, track_id: TrackId) -> Option<&Identity>
pub fn identity_for_track(&self, track_id: TrackId) -> Option<&Identity>
Get the identity for a track.
Sourcepub fn identity_for_signal(&self, signal_id: SignalId) -> Option<&Identity>
pub fn identity_for_signal(&self, signal_id: SignalId) -> Option<&Identity>
Get the identity for a signal (transitively through track).
Sourcepub fn identities(&self) -> impl Iterator<Item = &Identity>
pub fn identities(&self) -> impl Iterator<Item = &Identity>
Get all identities.
Sourcepub fn track_ref(&self, track_id: TrackId) -> Option<TrackRef>
pub fn track_ref(&self, track_id: TrackId) -> Option<TrackRef>
Get a TrackRef for a track in this document.
Returns None if the track doesn’t exist in this document.
This validates that the track is still present (tracks can be removed).
Sourcepub fn to_entities(&self) -> Vec<Entity>
pub fn to_entities(&self) -> Vec<Entity>
Convert to legacy Entity format for backwards compatibility.
Sourcepub fn from_entities(
id: impl Into<String>,
text: impl Into<String>,
entities: &[Entity],
) -> Self
pub fn from_entities( id: impl Into<String>, text: impl Into<String>, entities: &[Entity], ) -> Self
Create from legacy Entity slice.
Sourcepub fn signals_with_label(&self, label: &str) -> Vec<&Signal<Location>>
pub fn signals_with_label(&self, label: &str) -> Vec<&Signal<Location>>
Get signals filtered by label.
Sourcepub fn confident_signals(&self, threshold: f32) -> Vec<&Signal<Location>>
pub fn confident_signals(&self, threshold: f32) -> Vec<&Signal<Location>>
Get signals above a confidence threshold.
Sourcepub fn linked_tracks(&self) -> impl Iterator<Item = &Track>
pub fn linked_tracks(&self) -> impl Iterator<Item = &Track>
Get tracks that are linked to an identity.
Sourcepub fn unlinked_tracks(&self) -> impl Iterator<Item = &Track>
pub fn unlinked_tracks(&self) -> impl Iterator<Item = &Track>
Get tracks that are NOT linked to any identity (need resolution).
Sourcepub fn untracked_signal_count(&self) -> usize
pub fn untracked_signal_count(&self) -> usize
Count of signals that are not yet assigned to any track.
Sourcepub fn untracked_signals(&self) -> Vec<&Signal<Location>>
pub fn untracked_signals(&self) -> Vec<&Signal<Location>>
Get untracked signals (need coreference resolution).
Sourcepub fn signals_by_modality(&self, modality: Modality) -> Vec<&Signal<Location>>
pub fn signals_by_modality(&self, modality: Modality) -> Vec<&Signal<Location>>
Get signals filtered by modality.
Sourcepub fn text_signals(&self) -> Vec<&Signal<Location>>
pub fn text_signals(&self) -> Vec<&Signal<Location>>
Get all text-based signals (symbolic modality).
Sourcepub fn visual_signals(&self) -> Vec<&Signal<Location>>
pub fn visual_signals(&self) -> Vec<&Signal<Location>>
Get all visual signals (iconic modality).
Sourcepub fn overlapping_signals(&self, location: &Location) -> Vec<&Signal<Location>>
pub fn overlapping_signals(&self, location: &Location) -> Vec<&Signal<Location>>
Find signals that overlap with a given location.
Sourcepub fn signals_in_range(
&self,
start: usize,
end: usize,
) -> Vec<&Signal<Location>>
pub fn signals_in_range( &self, start: usize, end: usize, ) -> Vec<&Signal<Location>>
Find signals within a text range.
Sourcepub fn negated_signals(&self) -> Vec<&Signal<Location>>
pub fn negated_signals(&self) -> Vec<&Signal<Location>>
Get signals that are negated.
Sourcepub fn quantified_signals(
&self,
quantifier: Quantifier,
) -> Vec<&Signal<Location>>
pub fn quantified_signals( &self, quantifier: Quantifier, ) -> Vec<&Signal<Location>>
Get signals with a specific quantifier.
Sourcepub fn validate(&self) -> Vec<SignalValidationError>
pub fn validate(&self) -> Vec<SignalValidationError>
Validate all signals against the document text.
Returns a list of validation errors. Empty means all valid.
§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};
let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
doc.add_signal(Signal::new(0, Location::text(0, 11), "Marie Curie", "PER", 0.9));
assert!(doc.validate().is_empty());
// Bad signal: wrong text at offset
doc.add_signal(Signal::new(0, Location::text(0, 5), "WRONG", "PER", 0.9));
assert!(!doc.validate().is_empty());Sourcepub fn validate_invariants(&self) -> Vec<String>
pub fn validate_invariants(&self) -> Vec<String>
Validate structural invariants of the document.
Returns a list of invariant violations. An empty list means the document is structurally consistent.
This checks:
- Signal ID uniqueness
- Track signal references point to existing signals
signal_to_trackindex consistencytrack_to_identityindex consistency- Track identity references point to existing identities
Use this after any direct field manipulation to ensure consistency.
§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};
let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
doc.add_signal(Signal::new(0, Location::text(0, 11), "Marie Curie", "PER", 0.9));
assert!(doc.validate_invariants().is_empty());Sourcepub fn invariants_hold(&self) -> bool
pub fn invariants_hold(&self) -> bool
Check if all structural invariants hold.
Sourcepub fn add_signal_validated(
&mut self,
signal: Signal<Location>,
) -> Result<SignalId, SignalValidationError>
pub fn add_signal_validated( &mut self, signal: Signal<Location>, ) -> Result<SignalId, SignalValidationError>
Add a signal, validating it first.
Returns Err if the signal’s offsets don’t match the document text.
Sourcepub fn add_signal_from_text(
&mut self,
surface: &str,
label: impl Into<TypeLabel>,
confidence: f32,
) -> Option<SignalId>
pub fn add_signal_from_text( &mut self, surface: &str, label: impl Into<TypeLabel>, confidence: f32, ) -> Option<SignalId>
Add a signal by finding text in document (safe construction).
Returns the signal ID, or None if text not found.
§Example
use anno_core::grounded::GroundedDocument;
let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
let id = doc.add_signal_from_text("Marie Curie", "PER", 0.95);
assert!(id.is_some());Sourcepub fn add_signal_from_text_nth(
&mut self,
surface: &str,
label: impl Into<TypeLabel>,
confidence: f32,
occurrence: usize,
) -> Option<SignalId>
pub fn add_signal_from_text_nth( &mut self, surface: &str, label: impl Into<TypeLabel>, confidence: f32, occurrence: usize, ) -> Option<SignalId>
Add a signal by finding the nth occurrence of text.
Sourcepub fn stats(&self) -> DocumentStats
pub fn stats(&self) -> DocumentStats
Get statistics about the document.
Sourcepub fn add_signals(
&mut self,
signals: impl IntoIterator<Item = Signal<Location>>,
) -> Vec<SignalId>
pub fn add_signals( &mut self, signals: impl IntoIterator<Item = Signal<Location>>, ) -> Vec<SignalId>
Add multiple signals at once.
Returns the IDs of all added signals.
Sourcepub fn create_track_from_signals(
&mut self,
canonical: impl Into<String>,
signal_ids: &[SignalId],
) -> Option<TrackId>
pub fn create_track_from_signals( &mut self, canonical: impl Into<String>, signal_ids: &[SignalId], ) -> Option<TrackId>
Create a track from a list of signal IDs.
Automatically sets positions based on order.
Sourcepub fn merge_tracks(&mut self, track_ids: &[TrackId]) -> Option<TrackId>
pub fn merge_tracks(&mut self, track_ids: &[TrackId]) -> Option<TrackId>
Merge multiple tracks into one.
The resulting track has all signals from the input tracks. The canonical surface comes from the first track.
Sourcepub fn find_overlapping_signal_pairs(&self) -> Vec<(SignalId, SignalId)>
pub fn find_overlapping_signal_pairs(&self) -> Vec<(SignalId, SignalId)>
Find all pairs of overlapping signals (potential duplicates or nested entities).
Source§impl GroundedDocument
impl GroundedDocument
Sourcepub fn build_text_index(&self) -> TextSpatialIndex
pub fn build_text_index(&self) -> TextSpatialIndex
Build a spatial index for efficient text range queries.
This is useful for documents with many signals where you need to frequently query by text position.
§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};
let mut doc = GroundedDocument::new("doc", "Some text with entities.");
doc.add_signal(Signal::new(0, Location::text(0, 4), "Some", "T", 0.9));
doc.add_signal(Signal::new(0, Location::text(10, 14), "with", "T", 0.9));
let index = doc.build_text_index();
let in_range = index.query_contained_in(0, 20);
assert_eq!(in_range.len(), 2);Sourcepub fn query_signals_in_range_indexed(
&self,
start: usize,
end: usize,
) -> Vec<&Signal<Location>>
pub fn query_signals_in_range_indexed( &self, start: usize, end: usize, ) -> Vec<&Signal<Location>>
Query signals using the spatial index (builds index if needed).
For repeated queries, build the index once with build_text_index()
and reuse it.
Sourcepub fn query_overlapping_signals_indexed(
&self,
start: usize,
end: usize,
) -> Vec<&Signal<Location>>
pub fn query_overlapping_signals_indexed( &self, start: usize, end: usize, ) -> Vec<&Signal<Location>>
Query overlapping signals using spatial index.
Sourcepub fn to_coref_document(&self) -> CorefDocument
pub fn to_coref_document(&self) -> CorefDocument
Convert this grounded document into a coreference document for evaluation.
This is a lightweight bridge between the production pipeline types
(Signal/Track/Identity) and the evaluation-oriented coreference types
(CorefDocument, CorefChain, Mention).
- Each
Trackbecomes asuper::coref::CorefChain - Each track mention is derived from the track’s signal locations
- Non-text signals (iconic-only locations) are skipped
Note: Mention typing (proper/nominal/pronominal) is left unset; callers doing mention-type evaluation should compute that separately.
Trait Implementations§
Source§impl Clone for GroundedDocument
impl Clone for GroundedDocument
Source§fn clone(&self) -> GroundedDocument
fn clone(&self) -> GroundedDocument
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more