mnem_extract/lib.rs
1//! # mnem-extract
2//!
3//! Statistical, embedding-based entity + relation extraction for mnem.
4//!
5//! This crate is the default path for experiment E3 of the GraphRAG
6//! research track: it replaces LLM-driven NER with a KeyBERT-style
7//! candidate-scoring pass over the chunk embedding that mnem-ingest
8//! already computes. That keeps extraction deterministic, fully
9//! offline, and cost-free at ingest time.
10//!
11//! ## Scope
12//!
13//! - [`traits::Extractor`] - pluggable extractor surface. One default
14//! implementation ([`keybert::KeyBertExtractor`]) ships with the
15//! crate; callers can swap in authored or LLM-backed extractors by
16//! implementing the trait themselves.
17//! - [`keybert::KeyBertExtractor`] - KeyBERT-style n-gram ranking
18//! against a supplied chunk embedding, with MMR (Maximal Marginal
19//! Relevance) diversification and deterministic tiebreaks.
20//! - [`cooccurrence::mine_relations`] - PMI-weighted co-occurrence
21//! relation miner that emits one [`traits::Relation`] per sentence-
22//! local entity pair whose pointwise mutual information exceeds a
23//! configurable threshold.
24//!
25//! ## Determinism
26//!
27//! Every public extractor in this crate is deterministic: same input
28//! text + same embedder → byte-identical [`traits::Entity`] and
29//! [`traits::Relation`] streams across runs. The proptest suite under
30//! `tests/proptest_determinism.rs` enforces this as a first-class
31//! property.
32//!
33//! ## Non-goals
34//!
35//! - No LLM calls. No network. No tokio.
36//! - No training, no fine-tuning: the extractor consumes whatever
37//! [`mnem_embed_providers::Embedder`] the caller already configured.
38//! - No HTTP / MCP / CLI wiring lives in this crate; `mnem-ingest`
39//! exposes the integration and `mnem-cli` surfaces the flag.
40
41#![deny(missing_docs)]
42#![forbid(unsafe_code)]
43
44pub mod cooccurrence;
45pub mod keybert;
46pub mod traits;
47
48/// Optional typed-relation inference (gap 03). Gated behind the
49/// `typed-relations` Cargo feature. Default OFF per solution.md R3.
50#[cfg(feature = "typed-relations")]
51pub mod inference;
52
53/// Adversarial trust-boundary gate for opt-in typed-relation
54/// inference (gap 03). Gated behind the `typed-relations` Cargo
55/// feature. Default OFF.
56#[cfg(feature = "typed-relations")]
57pub mod trust;
58
59pub use cooccurrence::{CoOccurrenceMiner, mine_relations};
60pub use keybert::KeyBertExtractor;
61pub use traits::{Entity, ExtractionSource, Extractor, Relation};
62
63#[cfg(feature = "typed-relations")]
64pub use inference::{InferenceBudget, InferenceMethod, TypedRelation};
65#[cfg(feature = "typed-relations")]
66pub use trust::{
67 AuthorFingerprint, AuthorRateLimiter, Candidate, PPR_AMPLIFICATION_FLOOR, TrustBoundary,
68};