#[non_exhaustive]pub enum ExtractionMethod {
Pattern,
Neural,
Lexicon,
SoftLexicon,
GatedEnsemble,
Consensus,
Heuristic,
Unknown,
Rule,
ML,
Ensemble,
}Expand description
Extraction method used to identify an entity.
§Research Context
Different extraction methods have different strengths:
| Method | Precision | Recall | Generalization | Use Case |
|---|---|---|---|---|
| Pattern | Very High | Low | N/A (format-based) | Dates, emails, money |
| Neural | High | High | Good | General NER |
| Lexicon | Very High | Low | None | Closed-domain entities |
| SoftLexicon | Medium | High | Good for rare types | Low-resource NER |
| GatedEnsemble | Highest | Highest | Contextual | Short texts, domain shift |
See docs/ for repo-local notes and entry points.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Pattern
Regex pattern matching (high precision for structured data like dates, money). Does not generalize - only detects format-based entities.
Neural
Neural model inference (BERT, GLiNER, etc.). The recommended default for general NER. Generalizes to unseen entities.
Lexicon
Use Neural or GatedEnsemble instead
Exact lexicon/gazetteer lookup (deprecated approach). High precision on known entities, zero recall on novel entities. Only use for closed domains (stock tickers, medical codes).
SoftLexicon
Embedding-based soft lexicon matching. Useful for low-resource languages and rare entity types. See: Rijhwani et al. (2020) “Soft Gazetteers for Low-Resource NER”
GatedEnsemble
Gated ensemble: neural + lexicon with learned weighting. Model learns when to trust lexicon vs. context. See: Nie et al. (2021) “GEMNET: Effective Gated Gazetteer Representations”
Consensus
Multiple methods agreed on this entity (high confidence).
Heuristic
Heuristic-based extraction (capitalization, word shape, context). Used by heuristic backends that don’t use neural models.
Unknown
Unknown or unspecified extraction method.
Rule
Use Heuristic or Pattern instead
Legacy rule-based extraction (for backward compatibility).
ML
Use Neural instead
Legacy alias for Neural (for backward compatibility).
Ensemble
Use Consensus instead
Legacy alias for Consensus (for backward compatibility).
Implementations§
Source§impl ExtractionMethod
impl ExtractionMethod
Sourcepub const fn is_calibrated(&self) -> bool
pub const fn is_calibrated(&self) -> bool
Returns true if this extraction method produces probabilistically calibrated confidence scores suitable for calibration analysis (ECE, Brier score, etc.).
§Calibrated Methods
- Neural: Softmax outputs are intended to be probabilistic (though may need temperature scaling for true calibration)
- GatedEnsemble: Produces learned probability estimates
- SoftLexicon: Embedding similarity is pseudo-probabilistic
§Uncalibrated Methods
- Pattern: Binary (match/no-match); confidence is typically hardcoded
- Heuristic: Arbitrary scores from hand-crafted rules
- Lexicon: Binary exact match
- Consensus: Agreement count, not a probability
§Example
use anno_core::ExtractionMethod;
assert!(ExtractionMethod::Neural.is_calibrated());
assert!(!ExtractionMethod::Pattern.is_calibrated());
assert!(!ExtractionMethod::Heuristic.is_calibrated());Sourcepub const fn confidence_interpretation(&self) -> &'static str
pub const fn confidence_interpretation(&self) -> &'static str
Returns the confidence interpretation for this extraction method.
This helps users understand what the confidence score means:
"probability": Score approximates P(correct)"heuristic_score": Score is a non-probabilistic quality measure"binary": Score is 0 or 1 (or a fixed value for matches)
Trait Implementations§
Source§impl Clone for ExtractionMethod
impl Clone for ExtractionMethod
Source§fn clone(&self) -> ExtractionMethod
fn clone(&self) -> ExtractionMethod
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more