1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
//! Zero-waste ingestion pipeline.
//!
//! `rig-retrieval-evals` traditionally measures retrieval quality after documents
//! land in a store. The ingestion pipeline moves that gate upstream: only
//! commit deltas (net-new IoCs, graph edges, propositions) so the vector
//! store never accumulates redundant chunks.
//!
//! The base `ingestion` feature ships:
//!
//! - [`Document`] / [`Section`] — caller-supplied parsed input.
//! - [`IngestionDelta`] + [`Dropped`] / [`DroppedReason`] — inspectable
//! pipeline output. Every discarded item carries a structured reason.
//! - Track 1 (IoCs): [`IocExtractor`], [`IocBaseline`], [`RegexIocExtractor`],
//! [`InMemoryIocBaseline`].
//! - Track 2 (knowledge graph, opt-in): [`TripleExtractor`],
//! [`GraphBaseline`], [`StubTripleExtractor`], [`InMemoryGraphBaseline`].
//! A `petgraph`-backed [`PetgraphBaseline`] is available behind the
//! `ingestion-graph` sub-feature.
//! - Track 3 (propositions, opt-in): [`PropositionExtractor`],
//! [`RedundancyCheck`], [`StubPropositionExtractor`],
//! [`VectorStoreRedundancyCheck`].
//! - [`DistillationPipeline`] — orchestrator. Runs Track 1 always; layer
//! Track 2 with [`DistillationPipeline::with_graph`] and Track 3 with
//! [`DistillationPipeline::with_propositions`].
//!
//! ## Design notes
//!
//! - The crate is store-agnostic: the pipeline returns deltas; the caller
//! owns commits to their IoC store / graph DB / vector store.
//! - The crate is runtime-agnostic. Track concurrency (when more than one
//! track ships) uses `futures::join!`, not `tokio::join!`.
//! - Stub extractors live in the library (not `#[cfg(test)]`) so hosts can
//! use them as deterministic CI gates for their own pipelines.
//!
//! ## Example
//!
//! ```no_run
//! # use rig_retrieval_evals::{
//! # DistillationPipeline, Document, InMemoryIocBaseline, RegexIocExtractor,
//! # };
//! # async fn demo() -> Result<(), rig_retrieval_evals::Error> {
//! let pipeline = DistillationPipeline::new(
//! RegexIocExtractor::new()?,
//! InMemoryIocBaseline::new(),
//! );
//! let doc = Document::new(
//! "report-1",
//! "APT-28 exploited CVE-2024-12345 from 192.0.2.10.",
//! );
//! let delta = pipeline.ingest(&doc).await?;
//! assert!(!delta.iocs.is_empty());
//! # Ok(())
//! # }
//! ```
pub use PetgraphBaseline;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;