pub struct GroundedDocument {
pub id: String,
pub text: String,
pub signals: Vec<Signal>,
pub tracks: HashMap<TrackId, Track>,
pub identities: HashMap<IdentityId, Identity>,
/* private fields */
}Expand description
A document with grounded entity annotations using the three-level hierarchy.
§Entity-Centric Design
Traditional document representations store entities as a flat list. This design uses an entity-centric representation where:
- Signals are the atomic detections (Level 1)
- Tracks cluster signals into within-document entities (Level 2)
- Identities link tracks to global KB entities (Level 3)
This enables efficient:
- Streaming signal processing (add signals incrementally)
- Incremental coreference (cluster signals as they arrive)
- Lazy entity linking (resolve identities only when needed)
§Usage
use anno_core::grounded::{GroundedDocument, Signal, Track, Identity, Location};
let mut doc = GroundedDocument::new("doc1", "Marie Curie won the Nobel Prize. She was a physicist.");
// Add signals (Level 1)
doc.add_signal(Signal::new(0, Location::text(0, 12), "Marie Curie", "Person", 0.95));
doc.add_signal(Signal::new(1, Location::text(38, 41), "She", "Person", 0.88));
// Form track (Level 2)
let mut track = Track::new(0, "Marie Curie");
track.add_signal(0, 0);
track.add_signal(1, 1);
doc.add_track(track);
// Link identity (Level 3)
let identity = Identity::from_kb(0, "Marie Curie", "wikidata", "Q7186");
doc.add_identity(identity);
doc.link_track_to_identity(0, 0);§Invariants
GroundedDocument maintains internal indices (signal_to_track, track_to_identity)
that must be consistent with the public collections. The following invariants hold:
- Signal ID uniqueness: All signals in
signalshave distinctidvalues. - Track signal references: Every
SignalRefin aTrack.signalspoints to a valid signal ID insignals. - Signal-to-track consistency: If
signal_to_track[s] == t, then the tracktcontains aSignalRefpointing tos. - Track-to-identity consistency: If
track_to_identity[t] == i, thentracks[t].identity_id == Some(i)andidentitiescontainsi. - Signal offsets validity: Signal text locations should match
self.text.
Prefer mutation via provided methods (add_signal, add_track, add_signal_to_track,
link_track_to_identity) rather than direct field manipulation to preserve invariants.
Use validate_invariants() to check structural consistency
after external modifications.
Fields§
§id: StringDocument identifier
text: StringRaw text content
signals: Vec<Signal>Level 1: Raw signals (detections)
tracks: HashMap<TrackId, Track>Level 2: Tracks (within-document coreference chains)
identities: HashMap<IdentityId, Identity>Level 3: Global identities (KB-linked entities)
Implementations§
Source§impl GroundedDocument
impl GroundedDocument
Sourcepub fn new(id: impl Into<String>, text: impl Into<String>) -> GroundedDocument
pub fn new(id: impl Into<String>, text: impl Into<String>) -> GroundedDocument
Create a new grounded document.
Sourcepub fn add_signal(&mut self, signal: Signal) -> SignalId
pub fn add_signal(&mut self, signal: Signal) -> SignalId
Add a signal and return its ID.
Sourcepub fn get_track_mut(&mut self, id: impl Into<TrackId>) -> Option<&mut Track>
pub fn get_track_mut(&mut self, id: impl Into<TrackId>) -> Option<&mut Track>
Get a mutable reference to a track by ID.
Sourcepub fn add_signal_to_track(
&mut self,
signal_id: impl Into<SignalId>,
track_id: impl Into<TrackId>,
position: u32,
) -> bool
pub fn add_signal_to_track( &mut self, signal_id: impl Into<SignalId>, track_id: impl Into<TrackId>, position: u32, ) -> bool
Add a signal to an existing track.
This properly updates the signal_to_track index. Returns true if the signal was added, false if track doesn’t exist.
Sourcepub fn track_for_signal(&self, signal_id: SignalId) -> Option<&Track>
pub fn track_for_signal(&self, signal_id: SignalId) -> Option<&Track>
Get the track containing a signal.
Sourcepub fn add_identity(&mut self, identity: Identity) -> IdentityId
pub fn add_identity(&mut self, identity: Identity) -> IdentityId
Add an identity and return its ID.
Sourcepub fn link_track_to_identity(
&mut self,
track_id: impl Into<TrackId>,
identity_id: impl Into<IdentityId>,
)
pub fn link_track_to_identity( &mut self, track_id: impl Into<TrackId>, identity_id: impl Into<IdentityId>, )
Link a track to an identity.
Sourcepub fn get_identity(&self, id: IdentityId) -> Option<&Identity>
pub fn get_identity(&self, id: IdentityId) -> Option<&Identity>
Get an identity by ID.
Sourcepub fn identity_for_track(&self, track_id: TrackId) -> Option<&Identity>
pub fn identity_for_track(&self, track_id: TrackId) -> Option<&Identity>
Get the identity for a track.
Sourcepub fn identity_for_signal(&self, signal_id: SignalId) -> Option<&Identity>
pub fn identity_for_signal(&self, signal_id: SignalId) -> Option<&Identity>
Get the identity for a signal (transitively through track).
Sourcepub fn identities(&self) -> impl Iterator<Item = &Identity>
pub fn identities(&self) -> impl Iterator<Item = &Identity>
Get all identities.
Sourcepub fn track_ref(&self, track_id: TrackId) -> Option<TrackRef>
pub fn track_ref(&self, track_id: TrackId) -> Option<TrackRef>
Get a TrackRef for a track in this document.
Returns None if the track doesn’t exist in this document.
This validates that the track is still present (tracks can be removed).
Sourcepub fn to_entities(&self) -> Vec<Entity>
pub fn to_entities(&self) -> Vec<Entity>
Convert to legacy Entity format for backwards compatibility.
Sourcepub fn from_entities(
id: impl Into<String>,
text: impl Into<String>,
entities: &[Entity],
) -> GroundedDocument
pub fn from_entities( id: impl Into<String>, text: impl Into<String>, entities: &[Entity], ) -> GroundedDocument
Create from legacy Entity slice.
Sourcepub fn signals_with_label(&self, label: &str) -> Vec<&Signal>
pub fn signals_with_label(&self, label: &str) -> Vec<&Signal>
Get signals filtered by label.
Sourcepub fn confident_signals(&self, threshold: f32) -> Vec<&Signal>
pub fn confident_signals(&self, threshold: f32) -> Vec<&Signal>
Get signals above a confidence threshold.
Sourcepub fn linked_tracks(&self) -> impl Iterator<Item = &Track>
pub fn linked_tracks(&self) -> impl Iterator<Item = &Track>
Get tracks that are linked to an identity.
Sourcepub fn unlinked_tracks(&self) -> impl Iterator<Item = &Track>
pub fn unlinked_tracks(&self) -> impl Iterator<Item = &Track>
Get tracks that are NOT linked to any identity (need resolution).
Sourcepub fn untracked_signal_count(&self) -> usize
pub fn untracked_signal_count(&self) -> usize
Count of signals that are not yet assigned to any track.
Sourcepub fn untracked_signals(&self) -> Vec<&Signal>
pub fn untracked_signals(&self) -> Vec<&Signal>
Get untracked signals (need coreference resolution).
Sourcepub fn signals_by_modality(&self, modality: Modality) -> Vec<&Signal>
pub fn signals_by_modality(&self, modality: Modality) -> Vec<&Signal>
Get signals filtered by modality.
Sourcepub fn text_signals(&self) -> Vec<&Signal>
pub fn text_signals(&self) -> Vec<&Signal>
Get all text-based signals (symbolic modality).
Sourcepub fn visual_signals(&self) -> Vec<&Signal>
pub fn visual_signals(&self) -> Vec<&Signal>
Get all visual signals (iconic modality).
Sourcepub fn overlapping_signals(&self, location: &Location) -> Vec<&Signal>
pub fn overlapping_signals(&self, location: &Location) -> Vec<&Signal>
Find signals that overlap with a given location.
Sourcepub fn signals_in_range(&self, start: usize, end: usize) -> Vec<&Signal>
pub fn signals_in_range(&self, start: usize, end: usize) -> Vec<&Signal>
Find signals within a text range.
Sourcepub fn negated_signals(&self) -> Vec<&Signal>
pub fn negated_signals(&self) -> Vec<&Signal>
Get signals that are negated.
Sourcepub fn quantified_signals(&self, quantifier: Quantifier) -> Vec<&Signal>
pub fn quantified_signals(&self, quantifier: Quantifier) -> Vec<&Signal>
Get signals with a specific quantifier.
Sourcepub fn validate(&self) -> Vec<SignalValidationError>
pub fn validate(&self) -> Vec<SignalValidationError>
Validate all signals against the document text.
Returns a list of validation errors. Empty means all valid.
§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};
let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
doc.add_signal(Signal::new(0, Location::text(0, 11), "Marie Curie", "PER", 0.9));
assert!(doc.validate().is_empty());
// Bad signal: wrong text at offset
doc.add_signal(Signal::new(0, Location::text(0, 5), "WRONG", "PER", 0.9));
assert!(!doc.validate().is_empty());Sourcepub fn validate_invariants(&self) -> Vec<String>
pub fn validate_invariants(&self) -> Vec<String>
Validate structural invariants of the document.
Returns a list of invariant violations. An empty list means the document is structurally consistent.
This checks:
- Signal ID uniqueness
- Track signal references point to existing signals
signal_to_trackindex consistencytrack_to_identityindex consistency- Track identity references point to existing identities
Use this after any direct field manipulation to ensure consistency.
§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};
let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
doc.add_signal(Signal::new(0, Location::text(0, 11), "Marie Curie", "PER", 0.9));
assert!(doc.validate_invariants().is_empty());Sourcepub fn invariants_hold(&self) -> bool
pub fn invariants_hold(&self) -> bool
Check if all structural invariants hold.
Sourcepub fn add_signal_validated(
&mut self,
signal: Signal,
) -> Result<SignalId, SignalValidationError>
pub fn add_signal_validated( &mut self, signal: Signal, ) -> Result<SignalId, SignalValidationError>
Add a signal, validating it first.
Returns Err if the signal’s offsets don’t match the document text.
Sourcepub fn add_signal_from_text(
&mut self,
surface: &str,
label: impl Into<TypeLabel>,
confidence: f32,
) -> Option<SignalId>
pub fn add_signal_from_text( &mut self, surface: &str, label: impl Into<TypeLabel>, confidence: f32, ) -> Option<SignalId>
Add a signal by finding text in document (safe construction).
Returns the signal ID, or None if text not found.
§Example
use anno_core::grounded::GroundedDocument;
let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
let id = doc.add_signal_from_text("Marie Curie", "PER", 0.95);
assert!(id.is_some());Sourcepub fn add_signal_from_text_nth(
&mut self,
surface: &str,
label: impl Into<TypeLabel>,
confidence: f32,
occurrence: usize,
) -> Option<SignalId>
pub fn add_signal_from_text_nth( &mut self, surface: &str, label: impl Into<TypeLabel>, confidence: f32, occurrence: usize, ) -> Option<SignalId>
Add a signal by finding the nth occurrence of text.
Sourcepub fn stats(&self) -> DocumentStats
pub fn stats(&self) -> DocumentStats
Get statistics about the document.
Sourcepub fn add_signals(
&mut self,
signals: impl IntoIterator<Item = Signal>,
) -> Vec<SignalId>
pub fn add_signals( &mut self, signals: impl IntoIterator<Item = Signal>, ) -> Vec<SignalId>
Add multiple signals at once.
Returns the IDs of all added signals.
Sourcepub fn create_track_from_signals(
&mut self,
canonical: impl Into<String>,
signal_ids: &[SignalId],
) -> Option<TrackId>
pub fn create_track_from_signals( &mut self, canonical: impl Into<String>, signal_ids: &[SignalId], ) -> Option<TrackId>
Create a track from a list of signal IDs.
Automatically sets positions based on order.
Sourcepub fn merge_tracks(&mut self, track_ids: &[TrackId]) -> Option<TrackId>
pub fn merge_tracks(&mut self, track_ids: &[TrackId]) -> Option<TrackId>
Merge multiple tracks into one.
The resulting track has all signals from the input tracks. The canonical surface comes from the first track.
Sourcepub fn find_overlapping_signal_pairs(&self) -> Vec<(SignalId, SignalId)>
pub fn find_overlapping_signal_pairs(&self) -> Vec<(SignalId, SignalId)>
Find all pairs of overlapping signals (potential duplicates or nested entities).
Source§impl GroundedDocument
impl GroundedDocument
Sourcepub fn build_text_index(&self) -> TextSpatialIndex
pub fn build_text_index(&self) -> TextSpatialIndex
Build a spatial index for efficient text range queries.
This is useful for documents with many signals where you need to frequently query by text position.
§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};
let mut doc = GroundedDocument::new("doc", "Some text with entities.");
doc.add_signal(Signal::new(0, Location::text(0, 4), "Some", "T", 0.9));
doc.add_signal(Signal::new(0, Location::text(10, 14), "with", "T", 0.9));
let index = doc.build_text_index();
let in_range = index.query_contained_in(0, 20);
assert_eq!(in_range.len(), 2);Sourcepub fn query_signals_in_range_indexed(
&self,
start: usize,
end: usize,
) -> Vec<&Signal>
pub fn query_signals_in_range_indexed( &self, start: usize, end: usize, ) -> Vec<&Signal>
Query signals using the spatial index (builds index if needed).
For repeated queries, build the index once with build_text_index()
and reuse it.
Sourcepub fn query_overlapping_signals_indexed(
&self,
start: usize,
end: usize,
) -> Vec<&Signal>
pub fn query_overlapping_signals_indexed( &self, start: usize, end: usize, ) -> Vec<&Signal>
Query overlapping signals using spatial index.
Sourcepub fn to_coref_document(&self) -> CorefDocument
pub fn to_coref_document(&self) -> CorefDocument
Convert this grounded document into a coreference document for evaluation.
This is a lightweight bridge between the production pipeline types
(Signal/Track/Identity) and the evaluation-oriented coreference types
(CorefDocument, CorefChain, Mention).
- Each
Trackbecomes asuper::coref::CorefChain - Each track mention is derived from the track’s signal locations
- Non-text signals (iconic-only locations) are skipped
Note: Mention typing (proper/nominal/pronominal) is left unset; callers doing mention-type evaluation should compute that separately.
Trait Implementations§
Source§impl Clone for GroundedDocument
impl Clone for GroundedDocument
Source§fn clone(&self) -> GroundedDocument
fn clone(&self) -> GroundedDocument
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for GroundedDocument
impl Debug for GroundedDocument
Source§impl<'de> Deserialize<'de> for GroundedDocument
impl<'de> Deserialize<'de> for GroundedDocument
Source§fn deserialize<__D>(
__deserializer: __D,
) -> Result<GroundedDocument, <__D as Deserializer<'de>>::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(
__deserializer: __D,
) -> Result<GroundedDocument, <__D as Deserializer<'de>>::Error>where
__D: Deserializer<'de>,
Source§impl Serialize for GroundedDocument
impl Serialize for GroundedDocument
Source§fn serialize<__S>(
&self,
__serializer: __S,
) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>where
__S: Serializer,
fn serialize<__S>(
&self,
__serializer: __S,
) -> Result<<__S as Serializer>::Ok, <__S as Serializer>::Error>where
__S: Serializer,
Auto Trait Implementations§
impl Freeze for GroundedDocument
impl RefUnwindSafe for GroundedDocument
impl Send for GroundedDocument
impl Sync for GroundedDocument
impl Unpin for GroundedDocument
impl UnsafeUnpin for GroundedDocument
impl UnwindSafe for GroundedDocument
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more