Skip to main content

GroundedDocument

Struct GroundedDocument 

Source
pub struct GroundedDocument {
    pub id: String,
    pub text: String,
    pub signals: Vec<Signal<Location>>,
    pub tracks: HashMap<TrackId, Track>,
    pub identities: HashMap<IdentityId, Identity>,
    /* private fields */
}
Expand description

A document with grounded entity annotations using the three-level hierarchy.

§Entity-Centric Design

Traditional document representations store entities as a flat list. This design uses an entity-centric representation where:

  1. Signals are the atomic detections (Level 1)
  2. Tracks cluster signals into within-document entities (Level 2)
  3. Identities link tracks to global KB entities (Level 3)

This enables efficient:

  • Streaming signal processing (add signals incrementally)
  • Incremental coreference (cluster signals as they arrive)
  • Lazy entity linking (resolve identities only when needed)

§Usage

use anno_core::grounded::{GroundedDocument, Signal, Track, Identity, Location};

let mut doc = GroundedDocument::new("doc1", "Marie Curie won the Nobel Prize. She was a physicist.");

// Add signals (Level 1)
doc.add_signal(Signal::new(0, Location::text(0, 12), "Marie Curie", "Person", 0.95));
doc.add_signal(Signal::new(1, Location::text(38, 41), "She", "Person", 0.88));

// Form track (Level 2)
let mut track = Track::new(0, "Marie Curie");
track.add_signal(0, 0);
track.add_signal(1, 1);
doc.add_track(track);

// Link identity (Level 3)
let identity = Identity::from_kb(0, "Marie Curie", "wikidata", "Q7186");
doc.add_identity(identity);
doc.link_track_to_identity(0, 0);

§Invariants

GroundedDocument maintains internal indices (signal_to_track, track_to_identity) that must be consistent with the public collections. The following invariants hold:

  1. Signal ID uniqueness: All signals in signals have distinct id values.
  2. Track signal references: Every SignalRef in a Track.signals points to a valid signal ID in signals.
  3. Signal-to-track consistency: If signal_to_track[s] == t, then the track t contains a SignalRef pointing to s.
  4. Track-to-identity consistency: If track_to_identity[t] == i, then tracks[t].identity_id == Some(i) and identities contains i.
  5. Signal offsets validity: Signal text locations should match self.text.

Prefer mutation via provided methods (add_signal, add_track, add_signal_to_track, link_track_to_identity) rather than direct field manipulation to preserve invariants.

Use validate_invariants() to check structural consistency after external modifications.

Fields§

§id: String

Document identifier

§text: String

Raw text content

§signals: Vec<Signal<Location>>

Level 1: Raw signals (detections)

§tracks: HashMap<TrackId, Track>

Level 2: Tracks (within-document coreference chains)

§identities: HashMap<IdentityId, Identity>

Level 3: Global identities (KB-linked entities)

Implementations§

Source§

impl GroundedDocument

Source

pub fn new(id: impl Into<String>, text: impl Into<String>) -> Self

Create a new grounded document.

Source

pub fn add_signal(&mut self, signal: Signal<Location>) -> SignalId

Add a signal and return its ID.

Source

pub fn get_signal(&self, id: impl Into<SignalId>) -> Option<&Signal<Location>>

Get a signal by ID.

Source

pub fn signals(&self) -> &[Signal<Location>]

Get all signals.

Source

pub fn add_track(&mut self, track: Track) -> TrackId

Add a track and return its ID.

Source

pub fn get_track(&self, id: impl Into<TrackId>) -> Option<&Track>

Get a track by ID.

Source

pub fn get_track_mut(&mut self, id: impl Into<TrackId>) -> Option<&mut Track>

Get a mutable reference to a track by ID.

Source

pub fn add_signal_to_track( &mut self, signal_id: impl Into<SignalId>, track_id: impl Into<TrackId>, position: u32, ) -> bool

Add a signal to an existing track.

This properly updates the signal_to_track index. Returns true if the signal was added, false if track doesn’t exist.

Source

pub fn track_for_signal(&self, signal_id: SignalId) -> Option<&Track>

Get the track containing a signal.

Source

pub fn tracks(&self) -> impl Iterator<Item = &Track>

Get all tracks.

Source

pub fn add_identity(&mut self, identity: Identity) -> IdentityId

Add an identity and return its ID.

Link a track to an identity.

Source

pub fn get_identity(&self, id: IdentityId) -> Option<&Identity>

Get an identity by ID.

Source

pub fn identity_for_track(&self, track_id: TrackId) -> Option<&Identity>

Get the identity for a track.

Source

pub fn identity_for_signal(&self, signal_id: SignalId) -> Option<&Identity>

Get the identity for a signal (transitively through track).

Source

pub fn identities(&self) -> impl Iterator<Item = &Identity>

Get all identities.

Source

pub fn track_ref(&self, track_id: TrackId) -> Option<TrackRef>

Get a TrackRef for a track in this document.

Returns None if the track doesn’t exist in this document. This validates that the track is still present (tracks can be removed).

Source

pub fn to_entities(&self) -> Vec<Entity>

Convert to legacy Entity format for backwards compatibility.

Source

pub fn from_entities( id: impl Into<String>, text: impl Into<String>, entities: &[Entity], ) -> Self

Create from legacy Entity slice.

Source

pub fn signals_with_label(&self, label: &str) -> Vec<&Signal<Location>>

Get signals filtered by label.

Source

pub fn confident_signals(&self, threshold: f32) -> Vec<&Signal<Location>>

Get signals above a confidence threshold.

Source

pub fn linked_tracks(&self) -> impl Iterator<Item = &Track>

Get tracks that are linked to an identity.

Source

pub fn unlinked_tracks(&self) -> impl Iterator<Item = &Track>

Get tracks that are NOT linked to any identity (need resolution).

Source

pub fn untracked_signal_count(&self) -> usize

Count of signals that are not yet assigned to any track.

Source

pub fn untracked_signals(&self) -> Vec<&Signal<Location>>

Get untracked signals (need coreference resolution).

Source

pub fn signals_by_modality(&self, modality: Modality) -> Vec<&Signal<Location>>

Get signals filtered by modality.

Source

pub fn text_signals(&self) -> Vec<&Signal<Location>>

Get all text-based signals (symbolic modality).

Source

pub fn visual_signals(&self) -> Vec<&Signal<Location>>

Get all visual signals (iconic modality).

Source

pub fn overlapping_signals(&self, location: &Location) -> Vec<&Signal<Location>>

Find signals that overlap with a given location.

Source

pub fn signals_in_range( &self, start: usize, end: usize, ) -> Vec<&Signal<Location>>

Find signals within a text range.

Source

pub fn negated_signals(&self) -> Vec<&Signal<Location>>

Get signals that are negated.

Source

pub fn quantified_signals( &self, quantifier: Quantifier, ) -> Vec<&Signal<Location>>

Get signals with a specific quantifier.

Source

pub fn validate(&self) -> Vec<SignalValidationError>

Validate all signals against the document text.

Returns a list of validation errors. Empty means all valid.

§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};

let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
doc.add_signal(Signal::new(0, Location::text(0, 11), "Marie Curie", "PER", 0.9));
assert!(doc.validate().is_empty());

// Bad signal: wrong text at offset
doc.add_signal(Signal::new(0, Location::text(0, 5), "WRONG", "PER", 0.9));
assert!(!doc.validate().is_empty());
Source

pub fn validate_invariants(&self) -> Vec<String>

Validate structural invariants of the document.

Returns a list of invariant violations. An empty list means the document is structurally consistent.

This checks:

  1. Signal ID uniqueness
  2. Track signal references point to existing signals
  3. signal_to_track index consistency
  4. track_to_identity index consistency
  5. Track identity references point to existing identities

Use this after any direct field manipulation to ensure consistency.

§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};

let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
doc.add_signal(Signal::new(0, Location::text(0, 11), "Marie Curie", "PER", 0.9));
assert!(doc.validate_invariants().is_empty());
Source

pub fn invariants_hold(&self) -> bool

Check if all structural invariants hold.

Source

pub fn is_valid(&self) -> bool

Check if all signals are valid against document text.

Source

pub fn add_signal_validated( &mut self, signal: Signal<Location>, ) -> Result<SignalId, SignalValidationError>

Add a signal, validating it first.

Returns Err if the signal’s offsets don’t match the document text.

Source

pub fn add_signal_from_text( &mut self, surface: &str, label: impl Into<TypeLabel>, confidence: f32, ) -> Option<SignalId>

Add a signal by finding text in document (safe construction).

Returns the signal ID, or None if text not found.

§Example
use anno_core::grounded::GroundedDocument;

let mut doc = GroundedDocument::new("test", "Marie Curie was a physicist.");
let id = doc.add_signal_from_text("Marie Curie", "PER", 0.95);
assert!(id.is_some());
Source

pub fn add_signal_from_text_nth( &mut self, surface: &str, label: impl Into<TypeLabel>, confidence: f32, occurrence: usize, ) -> Option<SignalId>

Add a signal by finding the nth occurrence of text.

Source

pub fn stats(&self) -> DocumentStats

Get statistics about the document.

Source

pub fn add_signals( &mut self, signals: impl IntoIterator<Item = Signal<Location>>, ) -> Vec<SignalId>

Add multiple signals at once.

Returns the IDs of all added signals.

Source

pub fn create_track_from_signals( &mut self, canonical: impl Into<String>, signal_ids: &[SignalId], ) -> Option<TrackId>

Create a track from a list of signal IDs.

Automatically sets positions based on order.

Source

pub fn merge_tracks(&mut self, track_ids: &[TrackId]) -> Option<TrackId>

Merge multiple tracks into one.

The resulting track has all signals from the input tracks. The canonical surface comes from the first track.

Source

pub fn find_overlapping_signal_pairs(&self) -> Vec<(SignalId, SignalId)>

Find all pairs of overlapping signals (potential duplicates or nested entities).

Source§

impl GroundedDocument

Source

pub fn build_text_index(&self) -> TextSpatialIndex

Build a spatial index for efficient text range queries.

This is useful for documents with many signals where you need to frequently query by text position.

§Example
use anno_core::grounded::{GroundedDocument, Signal, Location};

let mut doc = GroundedDocument::new("doc", "Some text with entities.");
doc.add_signal(Signal::new(0, Location::text(0, 4), "Some", "T", 0.9));
doc.add_signal(Signal::new(0, Location::text(10, 14), "with", "T", 0.9));

let index = doc.build_text_index();
let in_range = index.query_contained_in(0, 20);
assert_eq!(in_range.len(), 2);
Source

pub fn query_signals_in_range_indexed( &self, start: usize, end: usize, ) -> Vec<&Signal<Location>>

Query signals using the spatial index (builds index if needed).

For repeated queries, build the index once with build_text_index() and reuse it.

Source

pub fn query_overlapping_signals_indexed( &self, start: usize, end: usize, ) -> Vec<&Signal<Location>>

Query overlapping signals using spatial index.

Source

pub fn to_coref_document(&self) -> CorefDocument

Convert this grounded document into a coreference document for evaluation.

This is a lightweight bridge between the production pipeline types (Signal/Track/Identity) and the evaluation-oriented coreference types (CorefDocument, CorefChain, Mention).

  • Each Track becomes a super::coref::CorefChain
  • Each track mention is derived from the track’s signal locations
  • Non-text signals (iconic-only locations) are skipped

Note: Mention typing (proper/nominal/pronominal) is left unset; callers doing mention-type evaluation should compute that separately.

Trait Implementations§

Source§

impl Clone for GroundedDocument

Source§

fn clone(&self) -> GroundedDocument

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for GroundedDocument

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for GroundedDocument

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for GroundedDocument

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,