pub struct Entity {
pub text: String,
pub entity_type: EntityType,
pub confidence: Confidence,
pub normalized: Option<String>,
pub provenance: Option<Provenance>,
pub kb_id: Option<String>,
pub canonical_id: Option<CanonicalId>,
pub hierarchical_confidence: Option<HierarchicalConfidence>,
pub visual_span: Option<Span>,
pub discontinuous_span: Option<DiscontinuousSpan>,
pub mention_type: Option<MentionType>,
/* private fields */
}Expand description
A recognized named entity or relation trigger.
§Entity Structure
"Contact John at john@example.com on Jan 15"
^^^^ ^^^^^^^^^^^^^^^^ ^^^^^^
PER EMAIL DATE
| | |
Named Contact Temporal
(ML) (Pattern) (Pattern)§Core Fields (Stable API)
text,entity_type,start,end,confidence— always presentnormalized,provenance— commonly used optional fieldskb_id,canonical_id— knowledge graph and coreference support
§Extended Fields (Research/Experimental)
The following fields support advanced research applications but may evolve:
| Field | Purpose | Status |
|---|---|---|
visual_span | Multi-modal (ColPali) extraction | Experimental |
discontinuous_span | W2NER non-contiguous entities | Experimental |
hierarchical_confidence | Coarse-to-fine NER | Experimental |
These fields are #[serde(skip_serializing_if = "Option::is_none")] so they
have no overhead when unused.
§Knowledge Graph Support
For GraphRAG and coreference resolution, entities support:
kb_id: External knowledge base identifier (e.g., Wikidata Q-ID)canonical_id: Local coreference cluster ID (links “John” and “he”)
§Normalization
Entities can have a normalized form for downstream processing:
- Dates: “Jan 15” → “2024-01-15” (ISO 8601)
- Money: “$1.5M” → “1500000 USD”
- Locations: “NYC” → “New York City”
Fields§
§text: StringEntity text (surface form as it appears in source)
entity_type: EntityTypeEntity type classification
confidence: ConfidenceConfidence score (0.0-1.0, calibrated).
Construction via Confidence::new clamps to [0.0, 1.0].
Use .value() or Into<f64> to extract the raw score.
normalized: Option<String>Normalized/canonical form (e.g., “Jan 15” → “2024-01-15”)
provenance: Option<Provenance>Provenance: which backend/method produced this entity
kb_id: Option<String>External knowledge base ID (e.g., “Q7186” for Marie Curie in Wikidata). Used for entity linking and GraphRAG applications.
canonical_id: Option<CanonicalId>Local coreference cluster ID.
Multiple mentions with the same canonical_id refer to the same entity.
Example: “Marie Curie” and “she” might share canonical_id = CanonicalId(42).
hierarchical_confidence: Option<HierarchicalConfidence>Hierarchical confidence (coarse-to-fine). Provides linkage, type, and boundary scores separately.
visual_span: Option<Span>Visual span for multi-modal (ColPali) extraction. When set, provides bounding box location in addition to text offsets.
discontinuous_span: Option<DiscontinuousSpan>Discontinuous span for non-contiguous entity mentions (W2NER support).
When set, overrides start/end for length calculations.
Example: “New York and LA [airports]” where “airports” modifies both.
mention_type: Option<MentionType>Mention type classification (Proper, Nominal, Pronominal, Zero).
Classifies the referring expression type for coreference resolution. Follows the Accessibility Hierarchy (Ariel 1990): Proper > Nominal > Pronominal > Zero.
Implementations§
Source§impl Entity
impl Entity
Sourcepub fn new(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: impl Into<Confidence>,
) -> Self
pub fn new( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: impl Into<Confidence>, ) -> Self
Create a new entity.
use anno_core::{Entity, EntityType};
let e = Entity::new("Berlin", EntityType::Location, 10, 16, 0.95);
assert_eq!(e.text, "Berlin");
assert_eq!(e.entity_type, EntityType::Location);
assert_eq!((e.start(), e.end()), (10, 16));Sourcepub fn set_start(&mut self, start: usize)
pub fn set_start(&mut self, start: usize)
Set the start offset. For use in post-processing pipelines.
Sourcepub fn set_end(&mut self, end: usize)
pub fn set_end(&mut self, end: usize)
Set the end offset. For use in post-processing pipelines.
Sourcepub fn with_provenance(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: impl Into<Confidence>,
provenance: Provenance,
) -> Self
pub fn with_provenance( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: impl Into<Confidence>, provenance: Provenance, ) -> Self
Create a new entity with provenance information.
Sourcepub fn with_hierarchical_confidence(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
confidence: HierarchicalConfidence,
) -> Self
pub fn with_hierarchical_confidence( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, confidence: HierarchicalConfidence, ) -> Self
Create an entity with hierarchical confidence scores.
Sourcepub fn from_visual(
text: impl Into<String>,
entity_type: EntityType,
bbox: Span,
confidence: impl Into<Confidence>,
) -> Self
pub fn from_visual( text: impl Into<String>, entity_type: EntityType, bbox: Span, confidence: impl Into<Confidence>, ) -> Self
Create an entity from a visual bounding box (ColPali multi-modal).
Sourcepub fn with_type(
text: impl Into<String>,
entity_type: EntityType,
start: usize,
end: usize,
) -> Self
pub fn with_type( text: impl Into<String>, entity_type: EntityType, start: usize, end: usize, ) -> Self
Create an entity with default confidence (1.0).
Sourcepub fn link_to_kb(&mut self, kb_id: impl Into<String>)
pub fn link_to_kb(&mut self, kb_id: impl Into<String>)
Link this entity to an external knowledge base.
§Examples
use anno_core::{Entity, EntityType};
let mut e = Entity::new("Marie Curie", EntityType::Person, 0, 11, 0.95);
e.link_to_kb("Q7186");
assert_eq!(e.kb_id.as_deref(), Some("Q7186"));Sourcepub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)
pub fn set_canonical(&mut self, canonical_id: impl Into<CanonicalId>)
Assign this entity to a coreference cluster.
Entities with the same canonical_id refer to the same real-world entity.
Sourcepub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self
pub fn with_canonical_id(self, canonical_id: impl Into<CanonicalId>) -> Self
Builder-style method to set canonical ID.
§Example
use anno_core::{CanonicalId, Entity, EntityType};
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.9)
.with_canonical_id(42);
assert_eq!(entity.canonical_id, Some(CanonicalId::new(42)));Sourcepub fn has_coreference(&self) -> bool
pub fn has_coreference(&self) -> bool
Check if this entity has coreference information.
Sourcepub fn is_discontinuous(&self) -> bool
pub fn is_discontinuous(&self) -> bool
Check if this entity has a discontinuous span.
Discontinuous entities span non-contiguous text regions. Example: “New York and LA airports” contains “New York airports” as a discontinuous entity.
Sourcepub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>
pub fn discontinuous_segments(&self) -> Option<Vec<Range<usize>>>
Get the discontinuous segments if present.
Returns None if this is a contiguous entity.
Sourcepub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)
pub fn set_discontinuous_span(&mut self, span: DiscontinuousSpan)
Set a discontinuous span for this entity.
This is used by W2NER and similar models that detect non-contiguous mentions.
Sourcepub fn total_len(&self) -> usize
pub fn total_len(&self) -> usize
Get the total length covered by this entity, in characters.
- Contiguous:
end - start - Discontinuous: sum of segment lengths
This is intentionally consistent: all offsets in anno::core entity spans
are character offsets (Unicode scalar values), not byte offsets.
Sourcepub fn set_normalized(&mut self, normalized: impl Into<String>)
pub fn set_normalized(&mut self, normalized: impl Into<String>)
Set the normalized form for this entity.
§Examples
use anno_core::{Entity, EntityType};
let mut entity = Entity::new("Jan 15", EntityType::Date, 0, 6, 0.95);
entity.set_normalized("2024-01-15");
assert_eq!(entity.normalized.as_deref(), Some("2024-01-15"));Sourcepub fn normalized_or_text(&self) -> &str
pub fn normalized_or_text(&self) -> &str
Get the normalized form, or the original text if not normalized.
Sourcepub fn method(&self) -> ExtractionMethod
pub fn method(&self) -> ExtractionMethod
Get the extraction method, if known.
Sourcepub fn category(&self) -> EntityCategory
pub fn category(&self) -> EntityCategory
Get the entity category.
Sourcepub fn is_structured(&self) -> bool
pub fn is_structured(&self) -> bool
Returns true if this entity was detected via patterns (not ML).
Sourcepub fn overlap_ratio(&self, other: &Entity) -> f64
pub fn overlap_ratio(&self, other: &Entity) -> f64
Calculate overlap ratio (IoU) with another entity.
Sourcepub fn set_hierarchical_confidence(
&mut self,
confidence: HierarchicalConfidence,
)
pub fn set_hierarchical_confidence( &mut self, confidence: HierarchicalConfidence, )
Set hierarchical confidence scores.
Sourcepub fn linkage_confidence(&self) -> Confidence
pub fn linkage_confidence(&self) -> Confidence
Get the linkage confidence (coarse filter score).
Sourcepub fn type_confidence(&self) -> Confidence
pub fn type_confidence(&self) -> Confidence
Get the type classification confidence.
Sourcepub fn boundary_confidence(&self) -> Confidence
pub fn boundary_confidence(&self) -> Confidence
Get the boundary confidence.
Sourcepub fn set_visual_span(&mut self, span: Span)
pub fn set_visual_span(&mut self, span: Span)
Create a unified TextSpan with both byte and char offsets.
This is useful when you need to work with both offset systems.
The text parameter must be the original source text from which
this entity was extracted.
§Arguments
source_text- The original text (needed to compute byte offsets)
§Returns
A TextSpan with both byte and char offsets.
§Note
This method requires the offset conversion utilities from the anno crate.
Use anno::offset::char_to_byte_offsets() directly for now.
§Example
use anno_core::{Entity, EntityType};
let (byte_start, byte_end) = char_to_byte_offsets(text, entity.start(), entity.end());Set visual span for multi-modal extraction.
Sourcepub fn extract_text(&self, source_text: &str) -> String
pub fn extract_text(&self, source_text: &str) -> String
Safely extract text from source using character offsets.
Entity stores character offsets, not byte offsets. This method correctly extracts text by iterating over characters.
§Arguments
source_text- The original text from which this entity was extracted
§Returns
The extracted text, or empty string if offsets are invalid
§Example
use anno_core::{Entity, EntityType};
let text = "Hello, 日本!";
let entity = Entity::new("日本", EntityType::Location, 7, 9, 0.95);
assert_eq!(entity.extract_text(text), "日本");Sourcepub fn extract_text_with_len(
&self,
source_text: &str,
text_char_count: usize,
) -> String
pub fn extract_text_with_len( &self, source_text: &str, text_char_count: usize, ) -> String
Extract text with pre-computed text length (performance optimization).
Use this when validating/clamping multiple entities from the same text
to avoid recalculating text.chars().count() for each entity.
§Arguments
source_text- The original texttext_char_count- Pre-computed character count (fromtext.chars().count())
§Returns
The extracted text, or empty string if offsets are invalid
Sourcepub fn builder(
text: impl Into<String>,
entity_type: EntityType,
) -> EntityBuilder
pub fn builder( text: impl Into<String>, entity_type: EntityType, ) -> EntityBuilder
Create a builder for fluent entity construction.
Sourcepub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>
pub fn validate(&self, source_text: &str) -> Vec<ValidationIssue>
Validate this entity against the source text.
Returns a list of validation issues. Empty list means the entity is valid.
§Checks Performed
- Span bounds:
start < end, both within text length - Text match:
textmatches the span in source - Confidence range:
confidencein [0.0, 1.0] - Type consistency: Custom types have non-empty names
- Discontinuous consistency: If present, segments are valid
§Example
use anno_core::{Entity, EntityType};
let text = "John works at Apple";
let entity = Entity::new("John", EntityType::Person, 0, 4, 0.95);
let issues = entity.validate(text);
assert!(issues.is_empty(), "Entity should be valid");
// Invalid entity: span doesn't match text
let bad = Entity::new("Jane", EntityType::Person, 0, 4, 0.95);
let issues = bad.validate(text);
assert!(!issues.is_empty(), "Entity text doesn't match span");Sourcepub fn validate_with_len(
&self,
source_text: &str,
text_char_count: usize,
) -> Vec<ValidationIssue>
pub fn validate_with_len( &self, source_text: &str, text_char_count: usize, ) -> Vec<ValidationIssue>
Validate entity with pre-computed text length (performance optimization).
Use this when validating multiple entities from the same text to avoid
recalculating text.chars().count() for each entity.
§Arguments
source_text- The original texttext_char_count- Pre-computed character count (fromtext.chars().count())
§Returns
Vector of validation issues (empty if valid)
Sourcepub fn is_valid(&self, source_text: &str) -> bool
pub fn is_valid(&self, source_text: &str) -> bool
Check if this entity is valid against the source text.
Convenience method that returns true if validate() returns empty.
Sourcepub fn validate_batch(
entities: &[Entity],
source_text: &str,
) -> HashMap<usize, Vec<ValidationIssue>>
pub fn validate_batch( entities: &[Entity], source_text: &str, ) -> HashMap<usize, Vec<ValidationIssue>>
Validate a batch of entities efficiently.
Returns a map of entity index -> validation issues. Only entities with issues are included.
§Example
use anno_core::{Entity, EntityType};
let text = "John and Jane work at Apple";
let entities = vec![
Entity::new("John", EntityType::Person, 0, 4, 0.95),
Entity::new("Wrong", EntityType::Person, 9, 13, 0.8),
];
let issues = Entity::validate_batch(&entities, text);
assert!(issues.is_empty() || issues.contains_key(&1)); // Second entity might failTrait Implementations§
Source§impl<'de> Deserialize<'de> for Entity
impl<'de> Deserialize<'de> for Entity
Source§fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error>
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error>
Source§impl From<&Entity> for Signal<Location>
Convert an Entity to a Signal<Location>.
impl From<&Entity> for Signal<Location>
Convert an Entity to a Signal<Location>.
Uses Location::Text for the span and preserves normalized, provenance,
and hierarchical_confidence fields. Discontinuous and visual spans are not
handled; use GroundedDocument::from_entities for full fidelity.