Struct Entity

Source

pub struct Entity {Show 17 fields
    pub text: String,
    pub entity_type: EntityType,
    pub start: usize,
    pub end: usize,
    pub confidence: Confidence,
    pub normalized: Option<String>,
    pub provenance: Option<Provenance>,
    pub kb_id: Option<String>,
    pub canonical_id: Option<CanonicalId>,
    pub hierarchical_confidence: Option<HierarchicalConfidence>,
    pub visual_span: Option<Span>,
    pub discontinuous_span: Option<DiscontinuousSpan>,
    pub valid_from: Option<DateTime<Utc>>,
    pub valid_until: Option<DateTime<Utc>>,
    pub viewport: Option<EntityViewport>,
    pub phi_features: Option<PhiFeatures>,
    pub mention_type: Option<MentionType>,
}

Expand description

A recognized named entity or relation trigger.

§Entity Structure

"Contact John at john@example.com on Jan 15"
         ^^^^    ^^^^^^^^^^^^^^^^    ^^^^^^
         PER     EMAIL               DATE
         |       |                   |
         Named   Contact             Temporal
         (ML)    (Pattern)           (Pattern)

§Core Fields (Stable API)

text, entity_type, start, end, confidence — always present
normalized, provenance — commonly used optional fields
kb_id, canonical_id — knowledge graph and coreference support

§Extended Fields (Research/Experimental)

The following fields support advanced research applications but may evolve:

Field	Purpose	Status
`visual_span`	Multi-modal (ColPali) extraction	Experimental
`discontinuous_span`	W2NER non-contiguous entities	Experimental
`valid_from`, `valid_until`	Temporal knowledge graphs	Research
`viewport`	Multi-faceted entity representation	Research
`hierarchical_confidence`	Coarse-to-fine NER	Experimental

These fields are #[serde(skip_serializing_if = "Option::is_none")] so they have no overhead when unused.

§Knowledge Graph Support

For GraphRAG and coreference resolution, entities support:

kb_id: External knowledge base identifier (e.g., Wikidata Q-ID)
canonical_id: Local coreference cluster ID (links “John” and “he”)

§Normalization

Entities can have a normalized form for downstream processing:

Dates: “Jan 15” → “2024-01-15” (ISO 8601)
Money: “$1.5M” → “1500000 USD”
Locations: “NYC” → “New York City”

Fields§

§text: String

Entity text (surface form as it appears in source)

§entity_type: EntityType

Entity type classification

§start: usize

Start position (character offset, NOT byte offset).

For Unicode text, character offsets differ from byte offsets. Use anno::offset::bytes_to_chars to convert if needed.

§end: usize

End position (character offset, exclusive).

For Unicode text, character offsets differ from byte offsets. Use anno::offset::bytes_to_chars to convert if needed.

§confidence: Confidence

Confidence score (0.0-1.0, calibrated).

Construction via Confidence::new clamps to [0.0, 1.0]. Use .value() or Into<f64> to extract the raw score.

§normalized: Option<String>

Normalized/canonical form (e.g., “Jan 15” → “2024-01-15”)

§provenance: Option<Provenance>

Provenance: which backend/method produced this entity

§kb_id: Option<String>

External knowledge base ID (e.g., “Q7186” for Marie Curie in Wikidata). Used for entity linking and GraphRAG applications.

§canonical_id: Option<CanonicalId>

Local coreference cluster ID. Multiple mentions with the same canonical_id refer to the same entity. Example: “Marie Curie” and “she” might share canonical_id = CanonicalId(42).

§hierarchical_confidence: Option<HierarchicalConfidence>

Hierarchical confidence (coarse-to-fine). Provides linkage, type, and boundary scores separately.

§visual_span: Option

Visual span for multi-modal (ColPali) extraction. When set, provides bounding box location in addition to text offsets.

§discontinuous_span: Option<DiscontinuousSpan>

Discontinuous span for non-contiguous entity mentions (W2NER support). When set, overrides start/end for length calculations. Example: “New York and LA [airports]” where “airports” modifies both.

§valid_from: Option<DateTime<Utc>>

Start of temporal validity interval for this entity assertion.

Entities are facts that may change over time:

“Satya Nadella is CEO of Microsoft” is valid from [2014, present]
“Steve Ballmer was CEO of Microsoft” was valid from [2000, 2014]

When None, the entity is either:

Currently valid (no known end date)
Atemporal (timeless fact like “Paris is in France”)

§Example

use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};

let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.valid_from = Some(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());

§valid_until: Option<DateTime<Utc>>

End of temporal validity interval for this entity assertion.

When None and valid_from is set, the fact is currently valid. When both are None, the entity is atemporal.

§viewport: Option<EntityViewport>

Viewport context for multi-faceted entity representation.

The same real-world entity can have different “faces” in different contexts:

“Marie Curie” in an academic context: professor, researcher
“Marie Curie” in a scientific context: physicist, chemist
“Marie Curie” in a personal context: mother, educator

This enables “holographic” entity projection at query time: given a query context, project the entity manifold to the relevant viewport.

§Example

use anno_core::{Entity, EntityType, EntityViewport};

let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.viewport = Some(EntityViewport::Academic);

§phi_features: Option<PhiFeatures>

Phi-features (person, number, gender) for morphological agreement.

Used for coreference constraints and zero pronoun resolution. In pro-drop languages (Arabic, Spanish, Japanese), verb morphology encodes subject features even when the pronoun is dropped.

§mention_type: Option<MentionType>

Mention type classification (Proper, Nominal, Pronominal, Zero).

Classifies the referring expression type for coreference resolution. Follows the Accessibility Hierarchy (Ariel 1990): Proper > Nominal > Pronominal > Zero.

Implementations§

Source §

impl Entity

Source

pub fn new( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: impl Into<Confidence>, ) -> Self

Create a new entity.

use anno_core::{Entity, EntityType};

let e = Entity::new("Berlin", EntityType::Location, 10, 16, 0.95);
assert_eq!(e.text, "Berlin");
assert_eq!(e.entity_type, EntityType::Location);
assert_eq!((e.start, e.end), (10, 16));

Source

pub fn with_provenance( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: impl Into<Confidence>, provenance: Provenance, ) -> Self

Create a new entity with provenance information.

Source

pub fn with_hierarchical_confidence( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: HierarchicalConfidence, ) -> Self

Create an entity with hierarchical confidence scores.

Source

pub fn from_visual( text: impl Into<String>, entity_type: EntityType, bbox: Span, confidence: impl Into<Confidence>, ) -> Self

Create an entity from a visual bounding box (ColPali multi-modal).

Source

pub fn with_type( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, ) -> Self

Create an entity with default confidence (1.0).

Source

pub fn link_to_kb(&mut self, kb_id: impl Into<String>)

Link this entity to an external knowledge base.

§Examples

use anno_core::{Entity, EntityType};
let mut e = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.95);
e.link_to_kb("Q7186");
assert_eq!(e.kb_id.as_deref(), Some("Q7186"));

Source

pub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)

Assign this entity to a coreference cluster.

Entities with the same canonical_id refer to the same real-world entity.

Source

pub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self

Builder-style method to set canonical ID.

§Example

use anno_core::{CanonicalId, Entity, EntityType};
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.9)
    .with_canonical_id(42);
assert_eq!(entity.canonical_id, Some(CanonicalId::new(42)));

Source

pub fn is_linked(&self) -> bool

Check if this entity is linked to a knowledge base.

Source

pub fn has_coreference(&self) -> bool

Check if this entity has coreference information.

Source

pub fn is_discontinuous(&self) -> bool

Check if this entity has a discontinuous span.

Discontinuous entities span non-contiguous text regions. Example: “New York and LA airports” contains “New York airports” as a discontinuous entity.

Source

pub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>

Get the discontinuous segments if present.

Returns None if this is a contiguous entity.

Source

pub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)

Set a discontinuous span for this entity.

This is used by W2NER and similar models that detect non-contiguous mentions.

Source

pub fn total_len(&self) -> usize

Get the total length covered by this entity, in characters.

Contiguous: end - start
Discontinuous: sum of segment lengths

This is intentionally consistent: all offsets in anno::core entity spans are character offsets (Unicode scalar values), not byte offsets.

Source

pub fn set_normalized(&mut self, normalized: impl Into<String>)

Set the normalized form for this entity.

§Examples

use anno_core::{Entity, EntityType};

let mut entity = Entity::new("Jan 15", EntityType::Date, 0, 6, 0.95);
entity.set_normalized("2024-01-15");
assert_eq!(entity.normalized.as_deref(), Some("2024-01-15"));

Source

pub fn normalized_or_text(&self) -> &str

Get the normalized form, or the original text if not normalized.

Source

pub fn method(&self) -> ExtractionMethod

Get the extraction method, if known.

Source

pub fn source(&self) -> Option<&str>

Get the source backend name, if known.

Source

pub fn category(&self) -> EntityCategory

Get the entity category.

Source

pub fn is_structured(&self) -> bool

Returns true if this entity was detected via patterns (not ML).

Source

pub fn is_named(&self) -> bool

Returns true if this entity required ML for detection.

Source

pub fn overlaps(&self, other: &Entity) -> bool

Check if this entity overlaps with another.

Source

pub fn overlap_ratio(&self, other: &Entity) -> f64

Calculate overlap ratio (IoU) with another entity.

Source

pub fn set_hierarchical_confidence( &mut self, confidence: HierarchicalConfidence, )

Set hierarchical confidence scores.

Source

pub fn linkage_confidence(&self) -> Confidence

Get the linkage confidence (coarse filter score).

Source

pub fn type_confidence(&self) -> Confidence

Get the type classification confidence.

Source

pub fn boundary_confidence(&self) -> Confidence

Get the boundary confidence.

Source

pub fn is_visual(&self) -> bool

Check if this entity has visual location (multi-modal).

Source

pub const fn text_span(&self) -> (usize, usize)

Get the text span (start, end).

Source

pub const fn span_len(&self) -> usize

Get the span length.

Source

pub fn set_visual_span(&mut self, span: Span)

Create a unified TextSpan with both byte and char offsets.

This is useful when you need to work with both offset systems. The text parameter must be the original source text from which this entity was extracted.

§Arguments

source_text - The original text (needed to compute byte offsets)

§Returns

A TextSpan with both byte and char offsets.

§Note

This method requires the offset conversion utilities from the anno crate. Use anno::offset::char_to_byte_offsets() directly for now.

§Example

use anno_core::{Entity, EntityType};

let (byte_start, byte_end) = char_to_byte_offsets(text, entity.start, entity.end);

Set visual span for multi-modal extraction.

Source

pub fn extract_text(&self, source_text: &str) -> String

Safely extract text from source using character offsets.

Entity stores character offsets, not byte offsets. This method correctly extracts text by iterating over characters.

§Arguments

source_text - The original text from which this entity was extracted

§Returns

The extracted text, or empty string if offsets are invalid

§Example

use anno_core::{Entity, EntityType};

let text = "Hello, 日本!";
let entity = Entity::new("日本", EntityType::Location, 7, 9, 0.95);
assert_eq!(entity.extract_text(text), "日本");

Source

pub fn extract_text_with_len( &self, source_text: &str, text_char_count: usize, ) -> String

Extract text with pre-computed text length (performance optimization).

Use this when validating/clamping multiple entities from the same text to avoid recalculating text.chars().count() for each entity.

§Arguments

source_text - The original text
text_char_count - Pre-computed character count (from text.chars().count())

§Returns

The extracted text, or empty string if offsets are invalid

Source

pub fn set_valid_from(&mut self, dt: DateTime<Utc>)

Set the temporal validity start for this entity assertion.

§Example

use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};

let mut entity = Entity::new("CEO", EntityType::Person, 0, 3, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());
assert!(entity.is_temporal());

Source

pub fn set_valid_until(&mut self, dt: DateTime<Utc>)

Set the temporal validity end for this entity assertion.

Source

pub fn set_temporal_range(&mut self, from: DateTime<Utc>, until: DateTime<Utc>)

Set both temporal bounds at once.

Source

pub fn is_temporal(&self) -> bool

Check if this entity has temporal validity information.

Source

pub fn valid_at(&self, timestamp: &DateTime<Utc>) -> bool

Check if this entity was valid at a specific point in time.

Returns true if:

No temporal bounds are set (atemporal entity)
The timestamp falls within [valid_from, valid_until]

§Example

use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};

let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 1, 1, 0, 0, 0).unwrap());
entity.set_valid_until(Utc.with_ymd_and_hms(2023, 12, 31, 0, 0, 0).unwrap());

let query_2015 = Utc.with_ymd_and_hms(2015, 6, 1, 0, 0, 0).unwrap();
let query_2005 = Utc.with_ymd_and_hms(2005, 6, 1, 0, 0, 0).unwrap();

assert!(entity.valid_at(&query_2015));
assert!(!entity.valid_at(&query_2005));

Source

pub fn is_currently_valid(&self) -> bool

Check if this entity is currently valid (at the current time).

Source

pub fn set_viewport(&mut self, viewport: EntityViewport)

Set the viewport context for this entity.

§Example

use anno_core::{Entity, EntityType, EntityViewport};

let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.set_viewport(EntityViewport::Academic);
assert!(entity.has_viewport());

Source

pub fn has_viewport(&self) -> bool

Check if this entity has a viewport context.

Source

pub fn viewport_or_default(&self) -> EntityViewport

Get the viewport, defaulting to General if not set.

Source

pub fn matches_viewport(&self, query_viewport: &EntityViewport) -> bool

Check if this entity matches a viewport context.

Returns true if:

The entity has no viewport (matches any)
The entity’s viewport matches the query

Source

pub fn builder( text: impl Into<String>, entity_type: EntityType, ) -> EntityBuilder

Create a builder for fluent entity construction.

Source

pub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>

Validate this entity against the source text.

Returns a list of validation issues. Empty list means the entity is valid.

§Checks Performed

Span bounds: start < end, both within text length
Text match: text matches the span in source
Confidence range: confidence in [0.0, 1.0]
Type consistency: Custom types have non-empty names
Discontinuous consistency: If present, segments are valid

§Example

use anno_core::{Entity, EntityType};

let text = "John works at Apple";
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.95);

let issues = entity.validate(text);
assert!(issues.is_empty(), "Entity should be valid");

// Invalid entity: span doesn't match text
let bad = Entity::new("Jane", EntityType::Person, 0, 4, 0.95);
let issues = bad.validate(text);
assert!(!issues.is_empty(), "Entity text doesn't match span");

Source

pub fn validate_with_len( &self, source_text: &str, text_char_count: usize, ) -> Vec<ValidationIssue>

Validate entity with pre-computed text length (performance optimization).

Use this when validating multiple entities from the same text to avoid recalculating text.chars().count() for each entity.

§Arguments

source_text - The original text
text_char_count - Pre-computed character count (from text.chars().count())

§Returns

Vector of validation issues (empty if valid)

Source

pub fn is_valid(&self, source_text: &str) -> bool

Check if this entity is valid against the source text.

Convenience method that returns true if validate() returns empty.

Source

pub fn validate_batch( entities: &[Entity], source_text: &str, ) -> HashMap<usize, Vec<ValidationIssue>>

Validate a batch of entities efficiently.

Returns a map of entity index -> validation issues. Only entities with issues are included.

§Example

use anno_core::{Entity, EntityType};

let text = "John and Jane work at Apple";
let entities = vec![
    Entity::new("John", EntityType::Person, 0, 4, 0.95),
    Entity::new("Wrong", EntityType::Person, 9, 13, 0.8),
];

let issues = Entity::validate_batch(&entities, text);
assert!(issues.is_empty() || issues.contains_key(&1)); // Second entity might fail

Trait Implementations§

Source §

impl Clone for Entity

Source §

fn clone(&self) -> Entity

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for Entity

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl<'de> Deserialize<'de> for Entity

Source §

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

Source §

impl From<&Entity> for Mention

Source §

fn from(entity: &Entity) -> Self

Converts to this type from the input type.

Source §

impl From<&Entity> for Signal<Location>

Convert an Entity to a Signal<Location>, mapping Entity’s f64 confidence to Signal’s f32 (clamped to [0,1]).

Uses Location::Text for the span and preserves normalized, provenance, and hierarchical_confidence fields. Discontinuous and visual spans are not handled; use GroundedDocument::from_entities for full fidelity.

Source §