Expand description
Named Entity Recognition (NER) engine — CE-4 GLiNER zero-shot NER.
Two-layer extraction pipeline:
- Rule-based pre-pass — regex extraction of dates, URLs, UUIDs, emails, IPs. Always on, zero latency, no model download required.
- GLiNER ONNX engine — zero-shot NER via GLiNER-medium ONNX INT8 (52 MB). Opt-in per namespace, lazy-loaded on first use.
Extracted entities are stored as tags: entity:person:alice, entity:org:anthropic.
Tag values are lowercased and whitespace-normalized for consistent deduplication.
Structs§
- Extracted
Entity - A single extracted entity.
- Gliner
Engine - GLiNER zero-shot NER engine backed by ONNX Runtime.
- NerEngine
- Unified NER engine combining rule-based and GLiNER extraction.
Functions§
- deduplicate_
entities - Deduplicate entities by (entity_type, normalized_value), keeping the highest score.
- normalize_
label - Normalize an entity type label: lowercase, spaces → underscores.
- rule_
based_ extract - Run the rule-based pre-pass — O(n) regex scan, zero model overhead.