Module extraction

Expand description

Hybrid entity extraction: regex pre-filter + GLiNER zero-shot NER (graceful degradation). Entity and URL extraction pipeline (v1.0.76).

v1.0.76: the default build is LLM-only. The legacy GLiNER NER pipeline moved to extraction_gliner.rs and is gated behind the ner-legacy feature. The default build extracts:

URLs via regex (always available, no model needed).
Entities via the ExtractionBackend trait (LLM headless). The default backend is LlmBackend (claude / codex), which produces structured entities and relationships via tool-use JSON.

The extract_graph_auto function below is the entry point used by remember, ingest, and enrich. With the default feature set, it runs the LLM extraction backend and returns whatever entities the LLM found. Operators who want the legacy GLiNER NER can build with --features ner-legacy (transition window only; removed in v1.1.0).

Structs§

ExtractedEntity: One named-entity mention. The default build produces these via the LLM extraction backend; the ner-legacy build produces them via GLiNER.
ExtractedUrl: One URL extracted from a body. Always produced by the regex path.
ExtractionResult: Full extraction result: URLs (regex), entities (LLM), and the relationships between them. The LLM backend also returns typed relationships directly in ExtractionOutput; this struct is the regex-only baseline that remember and ingest consume.
RegexExtractor: Regex-only extractor: URLs and nothing else. Used as a fast pre-pass before the (slower) LLM extractor in extract_graph_auto.

Enums§

GlinerVariant: GLiNER model variant enum. Only meaningful with the ner-legacy feature. In the default build, the variant is ignored and extraction is delegated to the LLM.

Traits§

Extractor: Trait abstraction for any extractor. The LLM backend and the GLiNER backend (ner-legacy) both implement it.

Functions§

extract_graph_auto
extract_urls: Extracts URLs from body using a substring scan. UTF-8 safe; offsets are byte indices into the input.