Skip to main content

ExtractionMethod

Enum ExtractionMethod 

Source
#[non_exhaustive]
pub enum ExtractionMethod { Pattern, Neural, Lexicon, SoftLexicon, GatedEnsemble, Consensus, Heuristic, Unknown, Rule, ML, Ensemble, }
Expand description

Extraction method used to identify an entity.

§Research Context

Different extraction methods have different strengths:

MethodPrecisionRecallGeneralizationUse Case
PatternVery HighLowN/A (format-based)Dates, emails, money
NeuralHighHighGoodGeneral NER
LexiconVery HighLowNoneClosed-domain entities
SoftLexiconMediumHighGood for rare typesLow-resource NER
GatedEnsembleHighestHighestContextualShort texts, domain shift

See docs/ for repo-local notes and entry points.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

Pattern

Regex pattern matching (high precision for structured data like dates, money). Does not generalize - only detects format-based entities.

§

Neural

Neural model inference (BERT, GLiNER, etc.). The recommended default for general NER. Generalizes to unseen entities.

§

Lexicon

👎Deprecated since 0.2.0: Use Neural or GatedEnsemble instead

Exact lexicon/gazetteer lookup (deprecated approach). High precision on known entities, zero recall on novel entities. Only use for closed domains (stock tickers, medical codes).

§

SoftLexicon

Embedding-based soft lexicon matching. Useful for low-resource languages and rare entity types. See: Rijhwani et al. (2020) “Soft Gazetteers for Low-Resource NER”

§

GatedEnsemble

Gated ensemble: neural + lexicon with learned weighting. Model learns when to trust lexicon vs. context. See: Nie et al. (2021) “GEMNET: Effective Gated Gazetteer Representations”

§

Consensus

Multiple methods agreed on this entity (high confidence).

§

Heuristic

Heuristic-based extraction (capitalization, word shape, context). Used by heuristic backends that don’t use neural models.

§

Unknown

Unknown or unspecified extraction method.

§

Rule

👎Deprecated since 0.2.0: Use Heuristic or Pattern instead

Legacy rule-based extraction (for backward compatibility).

§

ML

👎Deprecated since 0.2.0: Use Neural instead

Legacy alias for Neural (for backward compatibility).

§

Ensemble

👎Deprecated since 0.2.0: Use Consensus instead

Legacy alias for Consensus (for backward compatibility).

Implementations§

Source§

impl ExtractionMethod

Source

pub const fn is_calibrated(&self) -> bool

Returns true if this extraction method produces probabilistically calibrated confidence scores suitable for calibration analysis (ECE, Brier score, etc.).

§Calibrated Methods
  • Neural: Softmax outputs are intended to be probabilistic (though may need temperature scaling for true calibration)
  • GatedEnsemble: Produces learned probability estimates
  • SoftLexicon: Embedding similarity is pseudo-probabilistic
§Uncalibrated Methods
  • Pattern: Binary (match/no-match); confidence is typically hardcoded
  • Heuristic: Arbitrary scores from hand-crafted rules
  • Lexicon: Binary exact match
  • Consensus: Agreement count, not a probability
§Example
use anno_core::ExtractionMethod;

assert!(ExtractionMethod::Neural.is_calibrated());
assert!(!ExtractionMethod::Pattern.is_calibrated());
assert!(!ExtractionMethod::Heuristic.is_calibrated());
Source

pub const fn confidence_interpretation(&self) -> &'static str

Returns the confidence interpretation for this extraction method.

This helps users understand what the confidence score means:

  • "probability": Score approximates P(correct)
  • "heuristic_score": Score is a non-probabilistic quality measure
  • "binary": Score is 0 or 1 (or a fixed value for matches)

Trait Implementations§

Source§

impl Clone for ExtractionMethod

Source§

fn clone(&self) -> ExtractionMethod

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ExtractionMethod

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ExtractionMethod

Source§

fn default() -> ExtractionMethod

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for ExtractionMethod

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Display for ExtractionMethod

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Hash for ExtractionMethod

Source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl PartialEq for ExtractionMethod

Source§

fn eq(&self, other: &ExtractionMethod) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for ExtractionMethod

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl Copy for ExtractionMethod

Source§

impl Eq for ExtractionMethod

Source§

impl StructuralPartialEq for ExtractionMethod

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,