Skip to main content

Entity

Struct Entity 

Source
pub struct Entity {
Show 15 fields pub text: String, pub entity_type: EntityType, pub start: usize, pub end: usize, pub confidence: f64, pub normalized: Option<String>, pub provenance: Option<Provenance>, pub kb_id: Option<String>, pub canonical_id: Option<CanonicalId>, pub hierarchical_confidence: Option<HierarchicalConfidence>, pub visual_span: Option<Span>, pub discontinuous_span: Option<DiscontinuousSpan>, pub valid_from: Option<DateTime<Utc>>, pub valid_until: Option<DateTime<Utc>>, pub viewport: Option<EntityViewport>,
}
Expand description

A recognized named entity or relation trigger.

§Entity Structure

"Contact John at john@example.com on Jan 15"
         ^^^^    ^^^^^^^^^^^^^^^^    ^^^^^^
         PER     EMAIL               DATE
         |       |                   |
         Named   Contact             Temporal
         (ML)    (Pattern)           (Pattern)

§Core Fields (Stable API)

  • text, entity_type, start, end, confidence — always present
  • normalized, provenance — commonly used optional fields
  • kb_id, canonical_id — knowledge graph and coreference support

§Extended Fields (Research/Experimental)

The following fields support advanced research applications but may evolve:

FieldPurposeStatus
visual_spanMulti-modal (ColPali) extractionExperimental
discontinuous_spanW2NER non-contiguous entitiesExperimental
valid_from, valid_untilTemporal knowledge graphsResearch
viewportMulti-faceted entity representationResearch
hierarchical_confidenceCoarse-to-fine NERExperimental

These fields are #[serde(skip_serializing_if = "Option::is_none")] so they have no overhead when unused.

§Knowledge Graph Support

For GraphRAG and coreference resolution, entities support:

  • kb_id: External knowledge base identifier (e.g., Wikidata Q-ID)
  • canonical_id: Local coreference cluster ID (links “John” and “he”)

§Normalization

Entities can have a normalized form for downstream processing:

  • Dates: “Jan 15” → “2024-01-15” (ISO 8601)
  • Money: “$1.5M” → “1500000 USD”
  • Locations: “NYC” → “New York City”

Fields§

§text: String

Entity text (surface form as it appears in source)

§entity_type: EntityType

Entity type classification

§start: usize

Start position (character offset, NOT byte offset).

For Unicode text, character offsets differ from byte offsets. Use anno::offset::bytes_to_chars to convert if needed.

§end: usize

End position (character offset, exclusive).

For Unicode text, character offsets differ from byte offsets. Use anno::offset::bytes_to_chars to convert if needed.

§confidence: f64

Confidence score (0.0-1.0, calibrated)

§normalized: Option<String>

Normalized/canonical form (e.g., “Jan 15” → “2024-01-15”)

§provenance: Option<Provenance>

Provenance: which backend/method produced this entity

§kb_id: Option<String>

External knowledge base ID (e.g., “Q7186” for Marie Curie in Wikidata). Used for entity linking and GraphRAG applications.

§canonical_id: Option<CanonicalId>

Local coreference cluster ID. Multiple mentions with the same canonical_id refer to the same entity. Example: “Marie Curie” and “she” might share canonical_id = CanonicalId(42).

§hierarchical_confidence: Option<HierarchicalConfidence>

Hierarchical confidence (coarse-to-fine). Provides linkage, type, and boundary scores separately.

§visual_span: Option<Span>

Visual span for multi-modal (ColPali) extraction. When set, provides bounding box location in addition to text offsets.

§discontinuous_span: Option<DiscontinuousSpan>

Discontinuous span for non-contiguous entity mentions (W2NER support). When set, overrides start/end for length calculations. Example: “New York and LA [airports]” where “airports” modifies both.

§valid_from: Option<DateTime<Utc>>

Start of temporal validity interval for this entity assertion.

Entities are facts that may change over time:

  • “Satya Nadella is CEO of Microsoft” is valid from [2014, present]
  • “Steve Ballmer was CEO of Microsoft” was valid from [2000, 2014]

When None, the entity is either:

  • Currently valid (no known end date)
  • Atemporal (timeless fact like “Paris is in France”)

§Example

use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};

let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.valid_from = Some(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());
§valid_until: Option<DateTime<Utc>>

End of temporal validity interval for this entity assertion.

When None and valid_from is set, the fact is currently valid. When both are None, the entity is atemporal.

§viewport: Option<EntityViewport>

Viewport context for multi-faceted entity representation.

The same real-world entity can have different “faces” in different contexts:

  • “Marie Curie” in an academic context: professor, researcher
  • “Marie Curie” in a scientific context: physicist, chemist
  • “Marie Curie” in a personal context: mother, educator

This enables “holographic” entity projection at query time: given a query context, project the entity manifold to the relevant viewport.

§Example

use anno_core::{Entity, EntityType, EntityViewport};

let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.viewport = Some(EntityViewport::Academic);

Implementations§

Source§

impl Entity

Source

pub fn new( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: f64, ) -> Self

Create a new entity.

Source

pub fn with_provenance( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: f64, provenance: Provenance, ) -> Self

Create a new entity with provenance information.

Source

pub fn with_hierarchical_confidence( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: HierarchicalConfidence, ) -> Self

Create an entity with hierarchical confidence scores.

Source

pub fn from_visual( text: impl Into<String>, entity_type: EntityType, bbox: Span, confidence: f64, ) -> Self

Create an entity from a visual bounding box (ColPali multi-modal).

Source

pub fn with_type( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, ) -> Self

Create an entity with default confidence (1.0).

Link this entity to an external knowledge base.

§Examples
use anno_core::{Entity, EntityType};
let mut e = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.95);
e.link_to_kb("Q7186"); // Wikidata ID
Source

pub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)

Assign this entity to a coreference cluster.

Entities with the same canonical_id refer to the same real-world entity.

Source

pub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self

Builder-style method to set canonical ID.

§Example
use anno_core::{CanonicalId, Entity, EntityType};
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.9)
    .with_canonical_id(42);
assert_eq!(entity.canonical_id, Some(CanonicalId::new(42)));
Source

pub fn is_linked(&self) -> bool

Check if this entity is linked to a knowledge base.

Source

pub fn has_coreference(&self) -> bool

Check if this entity has coreference information.

Source

pub fn is_discontinuous(&self) -> bool

Check if this entity has a discontinuous span.

Discontinuous entities span non-contiguous text regions. Example: “New York and LA airports” contains “New York airports” as a discontinuous entity.

Source

pub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>

Get the discontinuous segments if present.

Returns None if this is a contiguous entity.

Source

pub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)

Set a discontinuous span for this entity.

This is used by W2NER and similar models that detect non-contiguous mentions.

Source

pub fn total_len(&self) -> usize

Get the total length covered by this entity, in characters.

  • Contiguous: end - start
  • Discontinuous: sum of segment lengths

This is intentionally consistent: all offsets in anno::core entity spans are character offsets (Unicode scalar values), not byte offsets.

Source

pub fn set_normalized(&mut self, normalized: impl Into<String>)

Set the normalized form for this entity.

§Examples
use anno_core::{Entity, EntityType};

let mut entity = Entity::new("Jan 15", EntityType::Date, 0, 6, 0.95);
entity.set_normalized("2024-01-15");
assert_eq!(entity.normalized.as_deref(), Some("2024-01-15"));
Source

pub fn normalized_or_text(&self) -> &str

Get the normalized form, or the original text if not normalized.

Source

pub fn method(&self) -> ExtractionMethod

Get the extraction method, if known.

Source

pub fn source(&self) -> Option<&str>

Get the source backend name, if known.

Source

pub fn category(&self) -> EntityCategory

Get the entity category.

Source

pub fn is_structured(&self) -> bool

Returns true if this entity was detected via patterns (not ML).

Source

pub fn is_named(&self) -> bool

Returns true if this entity required ML for detection.

Source

pub fn overlaps(&self, other: &Entity) -> bool

Check if this entity overlaps with another.

Source

pub fn overlap_ratio(&self, other: &Entity) -> f64

Calculate overlap ratio (IoU) with another entity.

Source

pub fn set_hierarchical_confidence( &mut self, confidence: HierarchicalConfidence, )

Set hierarchical confidence scores.

Source

pub fn linkage_confidence(&self) -> f32

Get the linkage confidence (coarse filter score).

Source

pub fn type_confidence(&self) -> f32

Get the type classification confidence.

Source

pub fn boundary_confidence(&self) -> f32

Get the boundary confidence.

Source

pub fn is_visual(&self) -> bool

Check if this entity has visual location (multi-modal).

Source

pub const fn text_span(&self) -> (usize, usize)

Get the text span (start, end).

Source

pub const fn span_len(&self) -> usize

Get the span length.

Source

pub fn set_visual_span(&mut self, span: Span)

Set visual span for multi-modal extraction.

Source

pub fn extract_text(&self, source_text: &str) -> String

Safely extract text from source using character offsets.

Entity stores character offsets, not byte offsets. This method correctly extracts text by iterating over characters.

§Arguments
  • source_text - The original text from which this entity was extracted
§Returns

The extracted text, or empty string if offsets are invalid

§Example
use anno_core::{Entity, EntityType};

let text = "Hello, 日本!";
let entity = Entity::new("日本", EntityType::Location, 7, 9, 0.95);
assert_eq!(entity.extract_text(text), "日本");
Source

pub fn extract_text_with_len( &self, source_text: &str, text_char_count: usize, ) -> String

Extract text with pre-computed text length (performance optimization).

Use this when validating/clamping multiple entities from the same text to avoid recalculating text.chars().count() for each entity.

§Arguments
  • source_text - The original text
  • text_char_count - Pre-computed character count (from text.chars().count())
§Returns

The extracted text, or empty string if offsets are invalid

Source

pub fn set_valid_from(&mut self, dt: DateTime<Utc>)

Set the temporal validity start for this entity assertion.

§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};

let mut entity = Entity::new("CEO", EntityType::Person, 0, 3, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());
assert!(entity.is_temporal());
Source

pub fn set_valid_until(&mut self, dt: DateTime<Utc>)

Set the temporal validity end for this entity assertion.

Source

pub fn set_temporal_range(&mut self, from: DateTime<Utc>, until: DateTime<Utc>)

Set both temporal bounds at once.

Source

pub fn is_temporal(&self) -> bool

Check if this entity has temporal validity information.

Source

pub fn valid_at(&self, timestamp: &DateTime<Utc>) -> bool

Check if this entity was valid at a specific point in time.

Returns true if:

  • No temporal bounds are set (atemporal entity)
  • The timestamp falls within [valid_from, valid_until]
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};

let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 1, 1, 0, 0, 0).unwrap());
entity.set_valid_until(Utc.with_ymd_and_hms(2023, 12, 31, 0, 0, 0).unwrap());

let query_2015 = Utc.with_ymd_and_hms(2015, 6, 1, 0, 0, 0).unwrap();
let query_2005 = Utc.with_ymd_and_hms(2005, 6, 1, 0, 0, 0).unwrap();

assert!(entity.valid_at(&query_2015));
assert!(!entity.valid_at(&query_2005));
Source

pub fn is_currently_valid(&self) -> bool

Check if this entity is currently valid (at the current time).

Source

pub fn set_viewport(&mut self, viewport: EntityViewport)

Set the viewport context for this entity.

§Example
use anno_core::{Entity, EntityType, EntityViewport};

let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.set_viewport(EntityViewport::Academic);
assert!(entity.has_viewport());
Source

pub fn has_viewport(&self) -> bool

Check if this entity has a viewport context.

Source

pub fn viewport_or_default(&self) -> EntityViewport

Get the viewport, defaulting to General if not set.

Source

pub fn matches_viewport(&self, query_viewport: &EntityViewport) -> bool

Check if this entity matches a viewport context.

Returns true if:

  • The entity has no viewport (matches any)
  • The entity’s viewport matches the query
Source

pub fn builder( text: impl Into<String>, entity_type: EntityType, ) -> EntityBuilder

Create a builder for fluent entity construction.

Source

pub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>

Validate this entity against the source text.

Returns a list of validation issues. Empty list means the entity is valid.

§Checks Performed
  1. Span bounds: start < end, both within text length
  2. Text match: text matches the span in source
  3. Confidence range: confidence in [0.0, 1.0]
  4. Type consistency: Custom types have non-empty names
  5. Discontinuous consistency: If present, segments are valid
§Example
use anno_core::{Entity, EntityType};

let text = "John works at Apple";
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.95);

let issues = entity.validate(text);
assert!(issues.is_empty(), "Entity should be valid");

// Invalid entity: span doesn't match text
let bad = Entity::new("Jane", EntityType::Person, 0, 4, 0.95);
let issues = bad.validate(text);
assert!(!issues.is_empty(), "Entity text doesn't match span");
Source

pub fn validate_with_len( &self, source_text: &str, text_char_count: usize, ) -> Vec<ValidationIssue>

Validate entity with pre-computed text length (performance optimization).

Use this when validating multiple entities from the same text to avoid recalculating text.chars().count() for each entity.

§Arguments
  • source_text - The original text
  • text_char_count - Pre-computed character count (from text.chars().count())
§Returns

Vector of validation issues (empty if valid)

Source

pub fn is_valid(&self, source_text: &str) -> bool

Check if this entity is valid against the source text.

Convenience method that returns true if validate() returns empty.

Source

pub fn validate_batch( entities: &[Entity], source_text: &str, ) -> HashMap<usize, Vec<ValidationIssue>>

Validate a batch of entities efficiently.

Returns a map of entity index -> validation issues. Only entities with issues are included.

§Example
use anno_core::{Entity, EntityType};

let text = "John and Jane work at Apple";
let entities = vec![
    Entity::new("John", EntityType::Person, 0, 4, 0.95),
    Entity::new("Wrong", EntityType::Person, 9, 13, 0.8),
];

let issues = Entity::validate_batch(&entities, text);
assert!(issues.is_empty() || issues.contains_key(&1)); // Second entity might fail

Trait Implementations§

Source§

impl Clone for Entity

Source§

fn clone(&self) -> Entity

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Entity

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for Entity

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl From<&Entity> for Mention

Source§

fn from(entity: &Entity) -> Self

Converts to this type from the input type.
Source§

impl Serialize for Entity

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,