Skip to main content

Crate mnem_extract

Crate mnem_extract 

Source
Expand description

§mnem-extract

Statistical, embedding-based entity + relation extraction for mnem.

This crate is the default path for experiment E3 of the GraphRAG research track: it replaces LLM-driven NER with a KeyBERT-style candidate-scoring pass over the chunk embedding that mnem-ingest already computes. That keeps extraction deterministic, fully offline, and cost-free at ingest time.

§Scope

  • traits::Extractor - pluggable extractor surface. One default implementation (keybert::KeyBertExtractor) ships with the crate; callers can swap in authored or LLM-backed extractors by implementing the trait themselves.
  • keybert::KeyBertExtractor - KeyBERT-style n-gram ranking against a supplied chunk embedding, with MMR (Maximal Marginal Relevance) diversification and deterministic tiebreaks.
  • cooccurrence::mine_relations - PMI-weighted co-occurrence relation miner that emits one traits::Relation per sentence- local entity pair whose pointwise mutual information exceeds a configurable threshold.

§Determinism

Every public extractor in this crate is deterministic: same input text + same embedder → byte-identical traits::Entity and traits::Relation streams across runs. The proptest suite under tests/proptest_determinism.rs enforces this as a first-class property.

§Non-goals

  • No LLM calls. No network. No tokio.
  • No training, no fine-tuning: the extractor consumes whatever mnem_embed_providers::Embedder the caller already configured.
  • No HTTP / MCP / CLI wiring lives in this crate; mnem-ingest exposes the integration and mnem-cli surfaces the flag.

Re-exports§

pub use cooccurrence::CoOccurrenceMiner;
pub use cooccurrence::mine_relations;
pub use keybert::KeyBertExtractor;
pub use traits::Entity;
pub use traits::ExtractionSource;
pub use traits::Extractor;
pub use traits::Relation;
pub use inference::InferenceBudget;
pub use inference::InferenceMethod;
pub use inference::TypedRelation;
pub use trust::AuthorFingerprint;
pub use trust::AuthorRateLimiter;
pub use trust::Candidate;
pub use trust::PPR_AMPLIFICATION_FLOOR;
pub use trust::TrustBoundary;

Modules§

cooccurrence
Co-occurrence relation miner - PMI-weighted edges between entities that share a sentence.
inference
Optional typed-relation inference (gap 03). Gated behind the typed-relations Cargo feature. Default OFF per solution.md R3. Optional typed-relation inference for mnem-extract.
keybert
KeyBERT-style statistical keyword / entity extractor.
traits
Public traits and value types for mnem-extract.
trust
Adversarial trust-boundary gate for opt-in typed-relation inference (gap 03). Gated behind the typed-relations Cargo feature. Default OFF. Trust-boundary gate for opt-in typed-relation inference.