Skip to main content

Module reference

Module reference 

Source
Expand description

Reference Resolution for Entity Extraction.

§Overview

Documents often contain references to external content:

  • URLs: Links to web pages with additional entity information
  • Citations: Academic references (Smith et al., 2020)
  • Cross-references: Internal document references (see Section 3)
  • Footnotes/Endnotes: Additional contextual information
  • Entity Links: Wikipedia, Wikidata, or other KB references

This module provides infrastructure for:

  1. Detecting references in text
  2. Resolving them to content
  3. Extracting entities from resolved content
  4. Linking back to the source document

§Integration with Coalesce

Resolved references provide additional evidence for entity coalescing:

  • A URL pointing to a Wikipedia page confirms entity identity
  • Citations can link entities mentioned in different contexts
  • Resolved content may contain canonical names or aliases

§Integration with Tier

References create hierarchical relationships:

  • Level 0: Entities in source document
  • Level 1: Entities in directly referenced documents
  • Level 2+: Entities in transitively referenced documents

This creates a “citation graph” that tier can cluster.

§Example

use anno::preprocess::reference::{ReferenceExtractor, ReferenceType};

let extractor = ReferenceExtractor::new();
let text = "See https://en.wikipedia.org/wiki/Albert_Einstein for more info.";
let refs = extractor.extract(text);

assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, ReferenceType::WikipediaUrl);

Structs§

ExtractedEntity
An entity extracted from resolved reference content.
Reference
A detected reference in text.
ReferenceExtractor
Extractor for references in text.
ReferenceGraph
Reference graph for tracking relationships between documents.
ResolvedReference
Resolved content from a reference.

Enums§

ReferenceType
Type of reference detected in text.