pub struct DiscontinuousSpan { /* private fields */ }Expand description
A discontinuous span representing non-contiguous entity mentions.
Some entities span multiple non-adjacent text regions:
- “severe [pain] in the [abdomen]” → “severe abdominal pain”
- “the [president] … [Obama]” → coreference
This is required for:
- Medical NER: Anatomical modifiers separated from findings
- Legal NER: Parties referenced across clauses
- W2NER: Word-word relation grids that detect discontinuous entities
§Offset Unit (CRITICAL)
DiscontinuousSpan uses character offsets (Unicode scalar value indices),
consistent with Entity::start /
Entity::end and anno::core::grounded::Location.
This is intentionally not byte offsets. If you have byte offsets (from regex,
str::find, tokenizers, etc.), convert them to character offsets first (see
anno::offset::SpanConverter in the anno crate).
§Example
use anno_core::DiscontinuousSpan;
// "severe pain in the abdomen" where "severe" modifies "pain"
// but they're separated by other words
let span = DiscontinuousSpan::new(vec![
0..6, // "severe"
12..16, // "pain"
]);
assert_eq!(span.num_segments(), 2);
assert!(span.is_discontinuous());Implementations§
Source§impl DiscontinuousSpan
impl DiscontinuousSpan
Sourcepub fn new(segments: Vec<Range<usize>>) -> Self
pub fn new(segments: Vec<Range<usize>>) -> Self
Create a new discontinuous span from segments.
Segments are sorted and validated (no overlaps).
Sourcepub fn contiguous(start: usize, end: usize) -> Self
pub fn contiguous(start: usize, end: usize) -> Self
Create from a single contiguous span.
Sourcepub fn num_segments(&self) -> usize
pub fn num_segments(&self) -> usize
Number of segments.
Sourcepub fn is_discontinuous(&self) -> bool
pub fn is_discontinuous(&self) -> bool
True if this spans multiple non-adjacent regions.
Sourcepub fn is_contiguous(&self) -> bool
pub fn is_contiguous(&self) -> bool
True if this is a single contiguous span.
Sourcepub fn bounding_range(&self) -> Option<Range<usize>>
pub fn bounding_range(&self) -> Option<Range<usize>>
Get the overall bounding range (start of first to end of last).
Sourcepub fn extract_text(&self, text: &str, separator: &str) -> String
pub fn extract_text(&self, text: &str, separator: &str) -> String
Extract text from each segment and join with separator.
Trait Implementations§
Source§impl Clone for DiscontinuousSpan
impl Clone for DiscontinuousSpan
Source§fn clone(&self) -> DiscontinuousSpan
fn clone(&self) -> DiscontinuousSpan
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more