pub struct Entity {Show 17 fields
pub text: String,
pub entity_type: EntityType,
pub start: usize,
pub end: usize,
pub confidence: Confidence,
pub normalized: Option<String>,
pub provenance: Option<Provenance>,
pub kb_id: Option<String>,
pub canonical_id: Option<CanonicalId>,
pub hierarchical_confidence: Option<HierarchicalConfidence>,
pub visual_span: Option<Span>,
pub discontinuous_span: Option<DiscontinuousSpan>,
pub valid_from: Option<DateTime<Utc>>,
pub valid_until: Option<DateTime<Utc>>,
pub viewport: Option<EntityViewport>,
pub phi_features: Option<PhiFeatures>,
pub mention_type: Option<MentionType>,
}Expand description
A recognized named entity or relation trigger.
§Entity Structure
"Contact John at john@example.com on Jan 15"
^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^
PER EMAIL DATE
| | |
Named Contact Temporal
(ML) (Pattern) (Pattern)§Core Fields (Stable API)
text,entity_type,start,end,confidence— always presentnormalized,provenance— commonly used optional fieldskb_id,canonical_id— knowledge graph and coreference support
§Extended Fields (Research/Experimental)
The following fields support advanced research applications but may evolve:
| Field | Purpose | Status |
|---|---|---|
visual_span | Multi-modal (ColPali) extraction | Experimental |
discontinuous_span | W2NER non-contiguous entities | Experimental |
valid_from, valid_until | Temporal knowledge graphs | Research |
viewport | Multi-faceted entity representation | Research |
hierarchical_confidence | Coarse-to-fine NER | Experimental |
These fields are #[serde(skip_serializing_if = "Option::is_none")] so they
have no overhead when unused.
§Knowledge Graph Support
For GraphRAG and coreference resolution, entities support:
kb_id: External knowledge base identifier (e.g., Wikidata Q-ID)canonical_id: Local coreference cluster ID (links “John” and “he”)
§Normalization
Entities can have a normalized form for downstream processing:
- Dates: “Jan 15” → “2024-01-15” (ISO 8601)
- Money: “$1.5M” → “1500000 USD”
- Locations: “NYC” → “New York City”
Fields§
§text: StringEntity text (surface form as it appears in source)
entity_type: EntityTypeEntity type classification
start: usizeStart position (character offset, NOT byte offset).
For Unicode text, character offsets differ from byte offsets.
Use anno::offset::bytes_to_chars to convert if needed.
end: usizeEnd position (character offset, exclusive).
For Unicode text, character offsets differ from byte offsets.
Use anno::offset::bytes_to_chars to convert if needed.
confidence: ConfidenceConfidence score (0.0-1.0, calibrated).
Construction via Confidence::new clamps to [0.0, 1.0].
Use .value() or Into<f64> to extract the raw score.
normalized: Option<String>Normalized/canonical form (e.g., “Jan 15” → “2024-01-15”)
provenance: Option<Provenance>Provenance: which backend/method produced this entity
kb_id: Option<String>External knowledge base ID (e.g., “Q7186” for Marie Curie in Wikidata). Used for entity linking and GraphRAG applications.
canonical_id: Option<CanonicalId>Local coreference cluster ID.
Multiple mentions with the same canonical_id refer to the same entity.
Example: “Marie Curie” and “she” might share canonical_id = CanonicalId(42).
hierarchical_confidence: Option<HierarchicalConfidence>Hierarchical confidence (coarse-to-fine). Provides linkage, type, and boundary scores separately.
visual_span: Option<Span>Visual span for multi-modal (ColPali) extraction. When set, provides bounding box location in addition to text offsets.
discontinuous_span: Option<DiscontinuousSpan>Discontinuous span for non-contiguous entity mentions (W2NER support).
When set, overrides start/end for length calculations.
Example: “New York and LA [airports]” where “airports” modifies both.
valid_from: Option<DateTime<Utc>>Start of temporal validity interval for this entity assertion.
Entities are facts that may change over time:
- “Satya Nadella is CEO of Microsoft” is valid from [2014, present]
- “Steve Ballmer was CEO of Microsoft” was valid from [2000, 2014]
When None, the entity is either:
- Currently valid (no known end date)
- Atemporal (timeless fact like “Paris is in France”)
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};
let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.valid_from = Some(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());valid_until: Option<DateTime<Utc>>End of temporal validity interval for this entity assertion.
When None and valid_from is set, the fact is currently valid.
When both are None, the entity is atemporal.
viewport: Option<EntityViewport>Viewport context for multi-faceted entity representation.
The same real-world entity can have different “faces” in different contexts:
- “Marie Curie” in an academic context: professor, researcher
- “Marie Curie” in a scientific context: physicist, chemist
- “Marie Curie” in a personal context: mother, educator
This enables “holographic” entity projection at query time: given a query context, project the entity manifold to the relevant viewport.
§Example
use anno_core::{Entity, EntityType, EntityViewport};
let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.viewport = Some(EntityViewport::Academic);phi_features: Option<PhiFeatures>Phi-features (person, number, gender) for morphological agreement.
Used for coreference constraints and zero pronoun resolution. In pro-drop languages (Arabic, Spanish, Japanese), verb morphology encodes subject features even when the pronoun is dropped.
mention_type: Option<MentionType>Mention type classification (Proper, Nominal, Pronominal, Zero).
Classifies the referring expression type for coreference resolution. Follows the Accessibility Hierarchy (Ariel 1990): Proper > Nominal > Pronominal > Zero.
Implementations§
Source§impl Entity
impl Entity
Sourcepub fn new(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: impl Into<Confidence>,
) -> Self
pub fn new( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: impl Into<Confidence>, ) -> Self
Create a new entity.
use anno_core::{Entity, EntityType};
let e = Entity::new("Berlin", EntityType::Location, 10, 16, 0.95);
assert_eq!(e.text, "Berlin");
assert_eq!(e.entity_type, EntityType::Location);
assert_eq!((e.start, e.end), (10, 16));Sourcepub fn with_provenance(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: impl Into<Confidence>,
provenance: Provenance,
) -> Self
pub fn with_provenance( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: impl Into<Confidence>, provenance: Provenance, ) -> Self
Create a new entity with provenance information.
Sourcepub fn with_hierarchical_confidence(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: HierarchicalConfidence,
) -> Self
pub fn with_hierarchical_confidence( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: HierarchicalConfidence, ) -> Self
Create an entity with hierarchical confidence scores.
Sourcepub fn from_visual(
text: impl Into<String>,
entity_type: EntityType,
bbox: Span,
confidence: impl Into<Confidence>,
) -> Self
pub fn from_visual( text: impl Into<String>, entity_type: EntityType, bbox: Span, confidence: impl Into<Confidence>, ) -> Self
Create an entity from a visual bounding box (ColPali multi-modal).
Sourcepub fn with_type(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
) -> Self
pub fn with_type( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, ) -> Self
Create an entity with default confidence (1.0).
Sourcepub fn link_to_kb(&mut self, kb_id: impl Into<String>)
pub fn link_to_kb(&mut self, kb_id: impl Into<String>)
Link this entity to an external knowledge base.
§Examples
use anno_core::{Entity, EntityType};
let mut e = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.95);
e.link_to_kb("Q7186");
assert_eq!(e.kb_id.as_deref(), Some("Q7186"));Sourcepub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)
pub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)
Assign this entity to a coreference cluster.
Entities with the same canonical_id refer to the same real-world entity.
Sourcepub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self
pub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self
Builder-style method to set canonical ID.
§Example
use anno_core::{CanonicalId, Entity, EntityType};
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.9)
.with_canonical_id(42);
assert_eq!(entity.canonical_id, Some(CanonicalId::new(42)));Sourcepub fn has_coreference(&self) -> bool
pub fn has_coreference(&self) -> bool
Check if this entity has coreference information.
Sourcepub fn is_discontinuous(&self) -> bool
pub fn is_discontinuous(&self) -> bool
Check if this entity has a discontinuous span.
Discontinuous entities span non-contiguous text regions. Example: “New York and LA airports” contains “New York airports” as a discontinuous entity.
Sourcepub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>
pub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>
Get the discontinuous segments if present.
Returns None if this is a contiguous entity.
Sourcepub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)
pub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)
Set a discontinuous span for this entity.
This is used by W2NER and similar models that detect non-contiguous mentions.
Sourcepub fn total_len(&self) -> usize
pub fn total_len(&self) -> usize
Get the total length covered by this entity, in characters.
- Contiguous:
end - start - Discontinuous: sum of segment lengths
This is intentionally consistent: all offsets in anno::core entity spans
are character offsets (Unicode scalar values), not byte offsets.
Sourcepub fn set_normalized(&mut self, normalized: impl Into<String>)
pub fn set_normalized(&mut self, normalized: impl Into<String>)
Set the normalized form for this entity.
§Examples
use anno_core::{Entity, EntityType};
let mut entity = Entity::new("Jan 15", EntityType::Date, 0, 6, 0.95);
entity.set_normalized("2024-01-15");
assert_eq!(entity.normalized.as_deref(), Some("2024-01-15"));Sourcepub fn normalized_or_text(&self) -> &str
pub fn normalized_or_text(&self) -> &str
Get the normalized form, or the original text if not normalized.
Sourcepub fn method(&self) -> ExtractionMethod
pub fn method(&self) -> ExtractionMethod
Get the extraction method, if known.
Sourcepub fn category(&self) -> EntityCategory
pub fn category(&self) -> EntityCategory
Get the entity category.
Sourcepub fn is_structured(&self) -> bool
pub fn is_structured(&self) -> bool
Returns true if this entity was detected via patterns (not ML).
Sourcepub fn overlap_ratio(&self, other: &Entity) -> f64
pub fn overlap_ratio(&self, other: &Entity) -> f64
Calculate overlap ratio (IoU) with another entity.
Sourcepub fn set_hierarchical_confidence(
&mut self,
confidence: HierarchicalConfidence,
)
pub fn set_hierarchical_confidence( &mut self, confidence: HierarchicalConfidence, )
Set hierarchical confidence scores.
Sourcepub fn linkage_confidence(&self) -> f32
pub fn linkage_confidence(&self) -> f32
Get the linkage confidence (coarse filter score).
Sourcepub fn type_confidence(&self) -> f32
pub fn type_confidence(&self) -> f32
Get the type classification confidence.
Sourcepub fn boundary_confidence(&self) -> f32
pub fn boundary_confidence(&self) -> f32
Get the boundary confidence.
Sourcepub fn set_visual_span(&mut self, span: Span)
pub fn set_visual_span(&mut self, span: Span)
Create a unified TextSpan with both byte and char offsets.
This is useful when you need to work with both offset systems.
The text parameter must be the original source text from which
this entity was extracted.
§Arguments
source_text- The original text (needed to compute byte offsets)
§Returns
A TextSpan with both byte and char offsets.
§Note
This method requires the offset conversion utilities from the anno crate.
Use anno::offset::char_to_byte_offsets() directly for now.
§Example
use anno_core::{Entity, EntityType};
let (byte_start, byte_end) = char_to_byte_offsets(text, entity.start, entity.end);Set visual span for multi-modal extraction.
Sourcepub fn extract_text(&self, source_text: &str) -> String
pub fn extract_text(&self, source_text: &str) -> String
Safely extract text from source using character offsets.
Entity stores character offsets, not byte offsets. This method correctly extracts text by iterating over characters.
§Arguments
source_text- The original text from which this entity was extracted
§Returns
The extracted text, or empty string if offsets are invalid
§Example
use anno_core::{Entity, EntityType};
let text = "Hello, 日本!";
let entity = Entity::new("日本", EntityType::Location, 7, 9, 0.95);
assert_eq!(entity.extract_text(text), "日本");Sourcepub fn extract_text_with_len(
&self,
source_text: &str,
text_char_count: usize,
) -> String
pub fn extract_text_with_len( &self, source_text: &str, text_char_count: usize, ) -> String
Extract text with pre-computed text length (performance optimization).
Use this when validating/clamping multiple entities from the same text
to avoid recalculating text.chars().count() for each entity.
§Arguments
source_text- The original texttext_char_count- Pre-computed character count (fromtext.chars().count())
§Returns
The extracted text, or empty string if offsets are invalid
Sourcepub fn set_valid_from(&mut self, dt: DateTime<Utc>)
pub fn set_valid_from(&mut self, dt: DateTime<Utc>)
Set the temporal validity start for this entity assertion.
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};
let mut entity = Entity::new("CEO", EntityType::Person, 0, 3, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());
assert!(entity.is_temporal());Sourcepub fn set_valid_until(&mut self, dt: DateTime<Utc>)
pub fn set_valid_until(&mut self, dt: DateTime<Utc>)
Set the temporal validity end for this entity assertion.
Sourcepub fn set_temporal_range(&mut self, from: DateTime<Utc>, until: DateTime<Utc>)
pub fn set_temporal_range(&mut self, from: DateTime<Utc>, until: DateTime<Utc>)
Set both temporal bounds at once.
Sourcepub fn is_temporal(&self) -> bool
pub fn is_temporal(&self) -> bool
Check if this entity has temporal validity information.
Sourcepub fn valid_at(&self, timestamp: &DateTime<Utc>) -> bool
pub fn valid_at(&self, timestamp: &DateTime<Utc>) -> bool
Check if this entity was valid at a specific point in time.
Returns true if:
- No temporal bounds are set (atemporal entity)
- The timestamp falls within [valid_from, valid_until]
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};
let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 1, 1, 0, 0, 0).unwrap());
entity.set_valid_until(Utc.with_ymd_and_hms(2023, 12, 31, 0, 0, 0).unwrap());
let query_2015 = Utc.with_ymd_and_hms(2015, 6, 1, 0, 0, 0).unwrap();
let query_2005 = Utc.with_ymd_and_hms(2005, 6, 1, 0, 0, 0).unwrap();
assert!(entity.valid_at(&query_2015));
assert!(!entity.valid_at(&query_2005));Sourcepub fn is_currently_valid(&self) -> bool
pub fn is_currently_valid(&self) -> bool
Check if this entity is currently valid (at the current time).
Sourcepub fn set_viewport(&mut self, viewport: EntityViewport)
pub fn set_viewport(&mut self, viewport: EntityViewport)
Set the viewport context for this entity.
§Example
use anno_core::{Entity, EntityType, EntityViewport};
let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.set_viewport(EntityViewport::Academic);
assert!(entity.has_viewport());Sourcepub fn has_viewport(&self) -> bool
pub fn has_viewport(&self) -> bool
Check if this entity has a viewport context.
Sourcepub fn viewport_or_default(&self) -> EntityViewport
pub fn viewport_or_default(&self) -> EntityViewport
Get the viewport, defaulting to General if not set.
Sourcepub fn matches_viewport(&self, query_viewport: &EntityViewport) -> bool
pub fn matches_viewport(&self, query_viewport: &EntityViewport) -> bool
Check if this entity matches a viewport context.
Returns true if:
- The entity has no viewport (matches any)
- The entity’s viewport matches the query
Sourcepub fn builder(
text: impl Into<String>,
entity_type: EntityType,
) -> EntityBuilder
pub fn builder( text: impl Into<String>, entity_type: EntityType, ) -> EntityBuilder
Create a builder for fluent entity construction.
Sourcepub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>
pub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>
Validate this entity against the source text.
Returns a list of validation issues. Empty list means the entity is valid.
§Checks Performed
- Span bounds:
start < end, both within text length - Text match:
textmatches the span in source - Confidence range:
confidencein [0.0, 1.0] - Type consistency: Custom types have non-empty names
- Discontinuous consistency: If present, segments are valid
§Example
use anno_core::{Entity, EntityType};
let text = "John works at Apple";
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.95);
let issues = entity.validate(text);
assert!(issues.is_empty(), "Entity should be valid");
// Invalid entity: span doesn't match text
let bad = Entity::new("Jane", EntityType::Person, 0, 4, 0.95);
let issues = bad.validate(text);
assert!(!issues.is_empty(), "Entity text doesn't match span");Sourcepub fn validate_with_len(
&self,
source_text: &str,
text_char_count: usize,
) -> Vec<ValidationIssue>
pub fn validate_with_len( &self, source_text: &str, text_char_count: usize, ) -> Vec<ValidationIssue>
Validate entity with pre-computed text length (performance optimization).
Use this when validating multiple entities from the same text to avoid
recalculating text.chars().count() for each entity.
§Arguments
source_text- The original texttext_char_count- Pre-computed character count (fromtext.chars().count())
§Returns
Vector of validation issues (empty if valid)
Sourcepub fn is_valid(&self, source_text: &str) -> bool
pub fn is_valid(&self, source_text: &str) -> bool
Check if this entity is valid against the source text.
Convenience method that returns true if validate() returns empty.
Sourcepub fn validate_batch(
entities: &[Entity],
source_text: &str,
) -> HashMap<usize, Vec<ValidationIssue>>
pub fn validate_batch( entities: &[Entity], source_text: &str, ) -> HashMap<usize, Vec<ValidationIssue>>
Validate a batch of entities efficiently.
Returns a map of entity index -> validation issues. Only entities with issues are included.
§Example
use anno_core::{Entity, EntityType};
let text = "John and Jane work at Apple";
let entities = vec![
Entity::new("John", EntityType::Person, 0, 4, 0.95),
Entity::new("Wrong", EntityType::Person, 9, 13, 0.8),
];
let issues = Entity::validate_batch(&entities, text);
assert!(issues.is_empty() || issues.contains_key(&1)); // Second entity might failTrait Implementations§
Source§impl<'de> Deserialize<'de> for Entity
impl<'de> Deserialize<'de> for Entity
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl From<&Entity> for Signal<Location>
Convert an Entity to a Signal<Location>, mapping Entity’s f64 confidence
to Signal’s f32 (clamped to [0,1]).
impl From<&Entity> for Signal<Location>
Convert an Entity to a Signal<Location>, mapping Entity’s f64 confidence
to Signal’s f32 (clamped to [0,1]).
Uses Location::Text for the span and preserves normalized, provenance,
and hierarchical_confidence fields. Discontinuous and visual spans are not
handled; use GroundedDocument::from_entities for full fidelity.