Expand description
Hybrid entity extraction: regex pre-filter + GLiNER zero-shot NER (graceful degradation). Entity and URL extraction pipeline (v1.0.76).
v1.0.76: the default build is LLM-only. v1.0.79: the legacy GLiNER
NER pipeline (extraction_gliner.rs, ner-legacy feature) was REMOVED
entirely. The build extracts:
- URLs via regex (always available, no model needed).
- Entities via the
ExtractionBackendtrait (LLM headless). The default backend isLlmBackend(claude / codex), which produces structured entities and relationships via tool-use JSON.
The extract_graph_auto function below is the entry point used by
remember, ingest, and enrich. With the default feature set, it
runs the regex URL pass; entity extraction happens through the LLM
extraction backend selected at the command layer.
Structs§
- Extracted
Entity - One named-entity mention. The default build produces these via the LLM extraction backend; the ner-legacy build produces them via GLiNER.
- Extracted
Url - One URL extracted from a body. Always produced by the regex path.
- Extraction
Result - Full extraction result: URLs (regex), entities (LLM), and the
relationships between them. The LLM backend also returns typed
relationships directly in
ExtractionOutput; this struct is the regex-only baseline thatrememberandingestconsume. - Regex
Extractor - Regex-only extractor: URLs and nothing else. Used as a fast
pre-pass before the (slower) LLM extractor in
extract_graph_auto.
Enums§
- Gliner
Variant - GLiNER model variant enum. Vestigial since v1.0.79: the
ner-legacyfeature was removed, so the variant is parsed for CLI compatibility and then ignored (extraction is URL-regex or LLM-delegated).
Traits§
- Extractor
- Trait abstraction for any extractor. The LLM backend and the GLiNER backend (ner-legacy) both implement it.
Functions§
- extract_
graph_ auto - Top-level extraction entry point used by
remember,ingest, andenrich. Runs the regex URL pass (always available); the legacy GLiNER delegation was removed in v1.0.79 together with thener-legacyfeature. - extract_
urls - Extracts URLs from
bodyusing a substring scan. UTF-8 safe; offsets are byte indices into the input.