Module extraction

Expand description

Hybrid entity extraction: regex pre-filter + GLiNER zero-shot NER (graceful degradation). Entity and URL extraction pipeline (v1.0.76).

v1.0.76: the default build is LLM-only. v1.0.79: the legacy GLiNER NER pipeline (extraction_gliner.rs, ner-legacy feature) was REMOVED entirely. The build extracts:

URLs via regex (always available, no model needed).
Entities via the ExtractionBackend trait (LLM headless). The default backend is LlmBackend (claude / codex), which produces structured entities and relationships via tool-use JSON.

The extract_graph_auto function below is the entry point used by remember, ingest, and enrich. With the default feature set, it runs the regex URL pass; entity extraction happens through the LLM extraction backend selected at the command layer.

Structs§

ExtractedEntity: One named-entity mention. The default build produces these via the LLM extraction backend; the ner-legacy build produces them via GLiNER.
ExtractedUrl: One URL extracted from a body. Always produced by the regex path.
ExtractionResult: Full extraction result: URLs (regex), entities (LLM), and the relationships between them. The LLM backend also returns typed relationships directly in ExtractionOutput; this struct is the regex-only baseline that remember and ingest consume.
RegexExtractor: Regex-only extractor: URLs and nothing else. Used as a fast pre-pass before the (slower) LLM extractor in extract_graph_auto.

Enums§

GlinerVariant: GLiNER model variant enum. Vestigial since v1.0.79: the ner-legacy feature was removed, so the variant is parsed for CLI compatibility and then ignored (extraction is URL-regex or LLM-delegated).

Traits§

Extractor: Trait abstraction for any extractor. The LLM backend and the GLiNER backend (ner-legacy) both implement it.

Functions§

extract_graph_auto: Top-level extraction entry point used by remember, ingest, and enrich. Runs the regex URL pass (always available); the legacy GLiNER delegation was removed in v1.0.79 together with the ner-legacy feature.
extract_urls: Extracts URLs from body using a substring scan. UTF-8 safe; offsets are byte indices into the input.