Expand description
Hybrid entity extraction: regex pre-filter + GLiNER zero-shot NER (graceful degradation). Entity and URL extraction pipeline (v1.0.76).
v1.0.76: the default build is LLM-only. The legacy GLiNER NER
pipeline moved to extraction_gliner.rs and is gated behind the
ner-legacy feature. The default build extracts:
- URLs via regex (always available, no model needed).
- Entities via the
ExtractionBackendtrait (LLM headless). The default backend isLlmBackend(claude / codex), which produces structured entities and relationships via tool-use JSON.
The extract_graph_auto function below is the entry point used by
remember, ingest, and enrich. With the default feature set, it
runs the LLM extraction backend and returns whatever entities the LLM
found. Operators who want the legacy GLiNER NER can build with
--features ner-legacy (transition window only; removed in v1.1.0).
Structs§
- Extracted
Entity - One named-entity mention. The default build produces these via the LLM extraction backend; the ner-legacy build produces them via GLiNER.
- Extracted
Url - One URL extracted from a body. Always produced by the regex path.
- Extraction
Result - Full extraction result: URLs (regex), entities (LLM), and the
relationships between them. The LLM backend also returns typed
relationships directly in
ExtractionOutput; this struct is the regex-only baseline thatrememberandingestconsume. - Regex
Extractor - Regex-only extractor: URLs and nothing else. Used as a fast
pre-pass before the (slower) LLM extractor in
extract_graph_auto.
Enums§
- Gliner
Variant - GLiNER model variant enum. Only meaningful with the
ner-legacyfeature. In the default build, the variant is ignored and extraction is delegated to the LLM.
Traits§
- Extractor
- Trait abstraction for any extractor. The LLM backend and the GLiNER backend (ner-legacy) both implement it.
Functions§
- extract_
graph_ auto - extract_
urls - Extracts URLs from
bodyusing a substring scan. UTF-8 safe; offsets are byte indices into the input.