#[non_exhaustive]pub enum ExtractionMethod {
Pattern,
Neural,
Lexicon,
SoftLexicon,
GatedEnsemble,
Consensus,
Heuristic,
Unknown,
Rule,
ML,
Ensemble,
}Expand description
Extraction method used to identify an entity.
§Research Context
Different extraction methods have different strengths:
| Method | Precision | Recall | Generalization | Use Case |
|---|---|---|---|---|
| Pattern | Very High | Low | N/A (format-based) | Dates, emails, money |
| Neural | High | High | Good | General NER |
| Lexicon | Very High | Low | None | Closed-domain entities |
| SoftLexicon | Medium | High | Good for rare types | Low-resource NER |
| GatedEnsemble | Highest | Highest | Contextual | Short texts, domain shift |
See docs/ for repo-local notes and entry points.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Pattern
Regex pattern matching (high precision for structured data like dates, money). Does not generalize - only detects format-based entities.
Neural
Neural model inference (BERT, GLiNER, etc.). The recommended default for general NER. Generalizes to unseen entities.
Lexicon
Exact lexicon/gazetteer lookup (deprecated approach). High precision on known entities, zero recall on novel entities. Only use for closed domains (stock tickers, medical codes).
SoftLexicon
Embedding-based soft lexicon matching. Useful for low-resource languages and rare entity types. See: Rijhwani et al. (2020) “Soft Gazetteers for Low-Resource NER”
GatedEnsemble
Gated ensemble: neural + lexicon with learned weighting. Model learns when to trust lexicon vs. context. See: Nie et al. (2021) “GEMNET: Effective Gated Gazetteer Representations”
Consensus
Multiple methods agreed on this entity (high confidence).
Heuristic
Heuristic-based extraction (capitalization, word shape, context). Used by heuristic backends that don’t use neural models.
Unknown
Unknown or unspecified extraction method.
Rule
Legacy rule-based extraction (for backward compatibility).
ML
Legacy alias for Neural (for backward compatibility).
Ensemble
Legacy alias for Consensus (for backward compatibility).
Implementations§
Source§impl ExtractionMethod
impl ExtractionMethod
Sourcepub const fn is_calibrated(&self) -> bool
pub const fn is_calibrated(&self) -> bool
Returns true if this extraction method produces probabilistically calibrated confidence scores suitable for calibration analysis (ECE, Brier score, etc.).
§Calibrated Methods
- Neural: Softmax outputs are intended to be probabilistic (though may need temperature scaling for true calibration)
- GatedEnsemble: Produces learned probability estimates
- SoftLexicon: Embedding similarity is pseudo-probabilistic
§Uncalibrated Methods
- Pattern: Binary (match/no-match); confidence is typically hardcoded
- Heuristic: Arbitrary scores from hand-crafted rules
- Lexicon: Binary exact match
- Consensus: Agreement count, not a probability
§Example
use anno_core::ExtractionMethod;
assert!(ExtractionMethod::Neural.is_calibrated());
assert!(!ExtractionMethod::Pattern.is_calibrated());
assert!(!ExtractionMethod::Heuristic.is_calibrated());Sourcepub const fn confidence_interpretation(&self) -> &'static str
pub const fn confidence_interpretation(&self) -> &'static str
Returns the confidence interpretation for this extraction method.
This helps users understand what the confidence score means:
"probability": Score approximates P(correct)"heuristic_score": Score is a non-probabilistic quality measure"binary": Score is 0 or 1 (or a fixed value for matches)
Trait Implementations§
Source§impl Clone for ExtractionMethod
impl Clone for ExtractionMethod
Source§fn clone(&self) -> ExtractionMethod
fn clone(&self) -> ExtractionMethod
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more