pub struct Entity {Show 15 fields
pub text: String,
pub entity_type: EntityType,
pub start: usize,
pub end: usize,
pub confidence: f64,
pub normalized: Option<String>,
pub provenance: Option<Provenance>,
pub kb_id: Option<String>,
pub canonical_id: Option<CanonicalId>,
pub hierarchical_confidence: Option<HierarchicalConfidence>,
pub visual_span: Option<Span>,
pub discontinuous_span: Option<DiscontinuousSpan>,
pub valid_from: Option<DateTime<Utc>>,
pub valid_until: Option<DateTime<Utc>>,
pub viewport: Option<EntityViewport>,
}Expand description
A recognized named entity or relation trigger.
§Entity Structure
"Contact John at john@example.com on Jan 15"
^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^
PER EMAIL DATE
| | |
Named Contact Temporal
(ML) (Pattern) (Pattern)§Core Fields (Stable API)
text,entity_type,start,end,confidence— always presentnormalized,provenance— commonly used optional fieldskb_id,canonical_id— knowledge graph and coreference support
§Extended Fields (Research/Experimental)
The following fields support advanced research applications but may evolve:
| Field | Purpose | Status |
|---|---|---|
visual_span | Multi-modal (ColPali) extraction | Experimental |
discontinuous_span | W2NER non-contiguous entities | Experimental |
valid_from, valid_until | Temporal knowledge graphs | Research |
viewport | Multi-faceted entity representation | Research |
hierarchical_confidence | Coarse-to-fine NER | Experimental |
These fields are #[serde(skip_serializing_if = "Option::is_none")] so they
have no overhead when unused.
§Knowledge Graph Support
For GraphRAG and coreference resolution, entities support:
kb_id: External knowledge base identifier (e.g., Wikidata Q-ID)canonical_id: Local coreference cluster ID (links “John” and “he”)
§Normalization
Entities can have a normalized form for downstream processing:
- Dates: “Jan 15” → “2024-01-15” (ISO 8601)
- Money: “$1.5M” → “1500000 USD”
- Locations: “NYC” → “New York City”
Fields§
§text: StringEntity text (surface form as it appears in source)
entity_type: EntityTypeEntity type classification
start: usizeStart position (character offset, NOT byte offset).
For Unicode text, character offsets differ from byte offsets.
Use anno::offset::bytes_to_chars to convert if needed.
end: usizeEnd position (character offset, exclusive).
For Unicode text, character offsets differ from byte offsets.
Use anno::offset::bytes_to_chars to convert if needed.
confidence: f64Confidence score (0.0-1.0, calibrated)
normalized: Option<String>Normalized/canonical form (e.g., “Jan 15” → “2024-01-15”)
provenance: Option<Provenance>Provenance: which backend/method produced this entity
kb_id: Option<String>External knowledge base ID (e.g., “Q7186” for Marie Curie in Wikidata). Used for entity linking and GraphRAG applications.
canonical_id: Option<CanonicalId>Local coreference cluster ID.
Multiple mentions with the same canonical_id refer to the same entity.
Example: “Marie Curie” and “she” might share canonical_id = CanonicalId(42).
hierarchical_confidence: Option<HierarchicalConfidence>Hierarchical confidence (coarse-to-fine). Provides linkage, type, and boundary scores separately.
visual_span: Option<Span>Visual span for multi-modal (ColPali) extraction. When set, provides bounding box location in addition to text offsets.
discontinuous_span: Option<DiscontinuousSpan>Discontinuous span for non-contiguous entity mentions (W2NER support).
When set, overrides start/end for length calculations.
Example: “New York and LA [airports]” where “airports” modifies both.
valid_from: Option<DateTime<Utc>>Start of temporal validity interval for this entity assertion.
Entities are facts that may change over time:
- “Satya Nadella is CEO of Microsoft” is valid from [2014, present]
- “Steve Ballmer was CEO of Microsoft” was valid from [2000, 2014]
When None, the entity is either:
- Currently valid (no known end date)
- Atemporal (timeless fact like “Paris is in France”)
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};
let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.valid_from = Some(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());valid_until: Option<DateTime<Utc>>End of temporal validity interval for this entity assertion.
When None and valid_from is set, the fact is currently valid.
When both are None, the entity is atemporal.
viewport: Option<EntityViewport>Viewport context for multi-faceted entity representation.
The same real-world entity can have different “faces” in different contexts:
- “Marie Curie” in an academic context: professor, researcher
- “Marie Curie” in a scientific context: physicist, chemist
- “Marie Curie” in a personal context: mother, educator
This enables “holographic” entity projection at query time: given a query context, project the entity manifold to the relevant viewport.
§Example
use anno_core::{Entity, EntityType, EntityViewport};
let mut entity = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.9);
entity.viewport = Some(EntityViewport::Academic);Implementations§
Source§impl Entity
impl Entity
Sourcepub fn new(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: f64,
) -> Self
pub fn new( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: f64, ) -> Self
Create a new entity.
Sourcepub fn with_provenance(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: f64,
provenance: Provenance,
) -> Self
pub fn with_provenance( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: f64, provenance: Provenance, ) -> Self
Create a new entity with provenance information.
Sourcepub fn with_hierarchical_confidence(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: HierarchicalConfidence,
) -> Self
pub fn with_hierarchical_confidence( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: HierarchicalConfidence, ) -> Self
Create an entity with hierarchical confidence scores.
Sourcepub fn from_visual(
text: impl Into<String>,
entity_type: EntityType,
bbox: Span,
confidence: f64,
) -> Self
pub fn from_visual( text: impl Into<String>, entity_type: EntityType, bbox: Span, confidence: f64, ) -> Self
Create an entity from a visual bounding box (ColPali multi-modal).
Sourcepub fn with_type(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
) -> Self
pub fn with_type( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, ) -> Self
Create an entity with default confidence (1.0).
Sourcepub fn link_to_kb(&mut self, kb_id: impl Into<String>)
pub fn link_to_kb(&mut self, kb_id: impl Into<String>)
Sourcepub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)
pub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)
Assign this entity to a coreference cluster.
Entities with the same canonical_id refer to the same real-world entity.
Sourcepub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self
pub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self
Builder-style method to set canonical ID.
§Example
use anno_core::{CanonicalId, Entity, EntityType};
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.9)
.with_canonical_id(42);
assert_eq!(entity.canonical_id, Some(CanonicalId::new(42)));Sourcepub fn has_coreference(&self) -> bool
pub fn has_coreference(&self) -> bool
Check if this entity has coreference information.
Sourcepub fn is_discontinuous(&self) -> bool
pub fn is_discontinuous(&self) -> bool
Check if this entity has a discontinuous span.
Discontinuous entities span non-contiguous text regions. Example: “New York and LA airports” contains “New York airports” as a discontinuous entity.
Sourcepub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>
pub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>
Get the discontinuous segments if present.
Returns None if this is a contiguous entity.
Sourcepub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)
pub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)
Set a discontinuous span for this entity.
This is used by W2NER and similar models that detect non-contiguous mentions.
Sourcepub fn total_len(&self) -> usize
pub fn total_len(&self) -> usize
Get the total length covered by this entity, in characters.
- Contiguous:
end - start - Discontinuous: sum of segment lengths
This is intentionally consistent: all offsets in anno::core entity spans
are character offsets (Unicode scalar values), not byte offsets.
Sourcepub fn set_normalized(&mut self, normalized: impl Into<String>)
pub fn set_normalized(&mut self, normalized: impl Into<String>)
Sourcepub fn normalized_or_text(&self) -> &str
pub fn normalized_or_text(&self) -> &str
Get the normalized form, or the original text if not normalized.
Sourcepub fn method(&self) -> ExtractionMethod
pub fn method(&self) -> ExtractionMethod
Get the extraction method, if known.
Sourcepub fn category(&self) -> EntityCategory
pub fn category(&self) -> EntityCategory
Get the entity category.
Sourcepub fn is_structured(&self) -> bool
pub fn is_structured(&self) -> bool
Returns true if this entity was detected via patterns (not ML).
Sourcepub fn overlap_ratio(&self, other: &Entity) -> f64
pub fn overlap_ratio(&self, other: &Entity) -> f64
Calculate overlap ratio (IoU) with another entity.
Sourcepub fn set_hierarchical_confidence(
&mut self,
confidence: HierarchicalConfidence,
)
pub fn set_hierarchical_confidence( &mut self, confidence: HierarchicalConfidence, )
Set hierarchical confidence scores.
Sourcepub fn linkage_confidence(&self) -> f32
pub fn linkage_confidence(&self) -> f32
Get the linkage confidence (coarse filter score).
Sourcepub fn type_confidence(&self) -> f32
pub fn type_confidence(&self) -> f32
Get the type classification confidence.
Sourcepub fn boundary_confidence(&self) -> f32
pub fn boundary_confidence(&self) -> f32
Get the boundary confidence.
Sourcepub fn set_visual_span(&mut self, span: Span)
pub fn set_visual_span(&mut self, span: Span)
Set visual span for multi-modal extraction.
Sourcepub fn extract_text(&self, source_text: &str) -> String
pub fn extract_text(&self, source_text: &str) -> String
Safely extract text from source using character offsets.
Entity stores character offsets, not byte offsets. This method correctly extracts text by iterating over characters.
§Arguments
source_text- The original text from which this entity was extracted
§Returns
The extracted text, or empty string if offsets are invalid
§Example
use anno_core::{Entity, EntityType};
let text = "Hello, 日本!";
let entity = Entity::new("日本", EntityType::Location, 7, 9, 0.95);
assert_eq!(entity.extract_text(text), "日本");Sourcepub fn extract_text_with_len(
&self,
source_text: &str,
text_char_count: usize,
) -> String
pub fn extract_text_with_len( &self, source_text: &str, text_char_count: usize, ) -> String
Extract text with pre-computed text length (performance optimization).
Use this when validating/clamping multiple entities from the same text
to avoid recalculating text.chars().count() for each entity.
§Arguments
source_text- The original texttext_char_count- Pre-computed character count (fromtext.chars().count())
§Returns
The extracted text, or empty string if offsets are invalid
Sourcepub fn set_valid_from(&mut self, dt: DateTime<Utc>)
pub fn set_valid_from(&mut self, dt: DateTime<Utc>)
Set the temporal validity start for this entity assertion.
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};
let mut entity = Entity::new("CEO", EntityType::Person, 0, 3, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 10, 1, 0, 0, 0).unwrap());
assert!(entity.is_temporal());Sourcepub fn set_valid_until(&mut self, dt: DateTime<Utc>)
pub fn set_valid_until(&mut self, dt: DateTime<Utc>)
Set the temporal validity end for this entity assertion.
Sourcepub fn set_temporal_range(&mut self, from: DateTime<Utc>, until: DateTime<Utc>)
pub fn set_temporal_range(&mut self, from: DateTime<Utc>, until: DateTime<Utc>)
Set both temporal bounds at once.
Sourcepub fn is_temporal(&self) -> bool
pub fn is_temporal(&self) -> bool
Check if this entity has temporal validity information.
Sourcepub fn valid_at(&self, timestamp: &DateTime<Utc>) -> bool
pub fn valid_at(&self, timestamp: &DateTime<Utc>) -> bool
Check if this entity was valid at a specific point in time.
Returns true if:
- No temporal bounds are set (atemporal entity)
- The timestamp falls within [valid_from, valid_until]
§Example
use anno_core::{Entity, EntityType};
use chrono::{TimeZone, Utc};
let mut entity = Entity::new("CEO of Microsoft", EntityType::Person, 0, 16, 0.9);
entity.set_valid_from(Utc.with_ymd_and_hms(2008, 1, 1, 0, 0, 0).unwrap());
entity.set_valid_until(Utc.with_ymd_and_hms(2023, 12, 31, 0, 0, 0).unwrap());
let query_2015 = Utc.with_ymd_and_hms(2015, 6, 1, 0, 0, 0).unwrap();
let query_2005 = Utc.with_ymd_and_hms(2005, 6, 1, 0, 0, 0).unwrap();
assert!(entity.valid_at(&query_2015));
assert!(!entity.valid_at(&query_2005));Sourcepub fn is_currently_valid(&self) -> bool
pub fn is_currently_valid(&self) -> bool
Check if this entity is currently valid (at the current time).
Sourcepub fn set_viewport(&mut self, viewport: EntityViewport)
pub fn set_viewport(&mut self, viewport: EntityViewport)
Sourcepub fn has_viewport(&self) -> bool
pub fn has_viewport(&self) -> bool
Check if this entity has a viewport context.
Sourcepub fn viewport_or_default(&self) -> EntityViewport
pub fn viewport_or_default(&self) -> EntityViewport
Get the viewport, defaulting to General if not set.
Sourcepub fn matches_viewport(&self, query_viewport: &EntityViewport) -> bool
pub fn matches_viewport(&self, query_viewport: &EntityViewport) -> bool
Check if this entity matches a viewport context.
Returns true if:
- The entity has no viewport (matches any)
- The entity’s viewport matches the query
Sourcepub fn builder(
text: impl Into<String>,
entity_type: EntityType,
) -> EntityBuilder
pub fn builder( text: impl Into<String>, entity_type: EntityType, ) -> EntityBuilder
Create a builder for fluent entity construction.
Sourcepub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>
pub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>
Validate this entity against the source text.
Returns a list of validation issues. Empty list means the entity is valid.
§Checks Performed
- Span bounds:
start < end, both within text length - Text match:
textmatches the span in source - Confidence range:
confidencein [0.0, 1.0] - Type consistency: Custom types have non-empty names
- Discontinuous consistency: If present, segments are valid
§Example
use anno_core::{Entity, EntityType};
let text = "John works at Apple";
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.95);
let issues = entity.validate(text);
assert!(issues.is_empty(), "Entity should be valid");
// Invalid entity: span doesn't match text
let bad = Entity::new("Jane", EntityType::Person, 0, 4, 0.95);
let issues = bad.validate(text);
assert!(!issues.is_empty(), "Entity text doesn't match span");Sourcepub fn validate_with_len(
&self,
source_text: &str,
text_char_count: usize,
) -> Vec<ValidationIssue>
pub fn validate_with_len( &self, source_text: &str, text_char_count: usize, ) -> Vec<ValidationIssue>
Validate entity with pre-computed text length (performance optimization).
Use this when validating multiple entities from the same text to avoid
recalculating text.chars().count() for each entity.
§Arguments
source_text- The original texttext_char_count- Pre-computed character count (fromtext.chars().count())
§Returns
Vector of validation issues (empty if valid)
Sourcepub fn is_valid(&self, source_text: &str) -> bool
pub fn is_valid(&self, source_text: &str) -> bool
Check if this entity is valid against the source text.
Convenience method that returns true if validate() returns empty.
Sourcepub fn validate_batch(
entities: &[Entity],
source_text: &str,
) -> HashMap<usize, Vec<ValidationIssue>>
pub fn validate_batch( entities: &[Entity], source_text: &str, ) -> HashMap<usize, Vec<ValidationIssue>>
Validate a batch of entities efficiently.
Returns a map of entity index -> validation issues. Only entities with issues are included.
§Example
use anno_core::{Entity, EntityType};
let text = "John and Jane work at Apple";
let entities = vec![
Entity::new("John", EntityType::Person, 0, 4, 0.95),
Entity::new("Wrong", EntityType::Person, 9, 13, 0.8),
];
let issues = Entity::validate_batch(&entities, text);
assert!(issues.is_empty() || issues.contains_key(&1)); // Second entity might fail