1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
//! # subsume
//!
//! Geometric region embeddings for subsumption, entailment, and logical query answering.
//!
//! `subsume` provides framework-agnostic traits and concrete backends for
//! geometric embeddings -- boxes, cones, octagons, Gaussians, and hyperbolic
//! intervals -- that encode hierarchical relationships through geometric
//! containment. If region A contains region B (B ⊆ A), then A *subsumes* B:
//! the more general concept contains the more specific one.
//!
//! # Getting Started
//!
//! | Goal | Start here |
//! |------|-----------|
//! | Understand the core abstraction | [`Box`] trait, [`BoxError`] |
//! | Use probabilistic (Gumbel) boxes | [`NdarrayGumbelBox`](ndarray_backend::NdarrayGumbelBox) |
//! | Use octagon embeddings (box + diagonal constraints) | [`NdarrayOctagon`](ndarray_backend::ndarray_octagon::NdarrayOctagon), [`octagon`] module |
//! | Fuzzy query answering (t-norms) | [`fuzzy::TNorm`], [`fuzzy::TConorm`], [`fuzzy`] module |
//! | Load a knowledge graph dataset | [`Dataset`], [`Triple`] |
//! | Train box embeddings (CPU) | [`BoxEmbeddingTrainer`], [`TrainingConfig`] |
//! | Train box embeddings (GPU) | [`CandleBoxTrainer`](trainer::candle_trainer::CandleBoxTrainer) (feature = `candle-backend`) |
//! | Evaluate with link prediction | [`evaluate_link_prediction`], [`CandleBoxTrainer::evaluate`](trainer::candle_trainer::CandleBoxTrainer::evaluate) |
//!
//! # Why regions instead of points?
//!
//! Point embeddings (TransE, RotatE) work for link prediction but cannot encode
//! containment, volume, or set operations. Regions become necessary when the task
//! requires:
//!
//! - **Subsumption**: box A inside box B means A is-a B
//! - **Generality**: large volume = broad concept, small volume = specific
//! - **Intersection**: combining two concepts (A ∧ B) yields a valid region
//! - **Negation**: cone complement is another cone (FOL queries with ¬)
//!
//! For standard triple scoring, points are simpler and equally accurate. For
//! ontology completion (EL++), taxonomy expansion, and logical query answering,
//! regions are structurally required.
//!
//! # Key Concepts
//!
//! **Box embeddings** represent concepts as hyperrectangles. Unlike point vectors,
//! boxes have volume, which encodes generality: a broad concept ("animal") is a
//! large box containing smaller boxes ("dog", "cat").
//!
//! **Gumbel boxes** solve the *local identifiability problem* of hard boxes by
//! modeling coordinates as Gumbel random variables. This ensures dense gradients
//! throughout training -- hard boxes create flat regions where gradients vanish.
//!
//! **Containment probability** measures entailment (P(B ⊆ A)), while **overlap
//! probability** measures relatedness without strict hierarchy. These two scores
//! are the primary outputs of box embedding models.
//!
//! # Module Organization
//!
//! ## Core traits and geometry
//!
//! - [`box_trait`] -- the [`Box`] trait: containment, overlap, volume
//! - [`octagon`] -- octagon error types (implementations in [`ndarray_backend`])
//! - [`cone`] -- cone error types (implementations in [`ndarray_backend`])
//! - `hyperbolic` -- Poincare ball embeddings for tree-like hierarchies (feature-gated)
//! - [`sheaf`] -- sheaf neural networks for transitivity/consistency on graphs
//! - [`gaussian`] -- diagonal Gaussian box embeddings (KL, Bhattacharyya)
//!
//! ## Representations and scoring
//!
//! - [`distance`] -- Query2Box distance scoring
//! - [`fuzzy`] -- t-norms, t-conorms, and negation for fuzzy query answering (FuzzQE)
//!
//! ## Ontology and taxonomy
//!
//! - [`el`] -- EL++ ontology embedding primitives (Box2EL / TransBox)
//! - [`taxonomy`] -- TaxoBell-format taxonomy dataset loader
//! - [`taxobell`] -- TaxoBell combined training loss
//!
//! ## Training and evaluation
//!
//! - [`dataset`] -- load WN18RR, FB15k-237, YAGO3-10, and similar KG datasets
//! - [`trainable`] -- [`trainable::TrainableBox`] and [`trainable::TrainableCone`] with learnable parameters
//! - [`trainer`] -- negative sampling, loss computation, link prediction evaluation.
//! Includes [`CandleBoxTrainer`](trainer::candle_trainer::CandleBoxTrainer) for GPU training
//! with AdamW, cosine LR, self-adversarial NS, and filtered evaluation.
//! - [`metrics`] -- rank-based metrics (MRR, Hits@k, Mean Rank)
//! - [`optimizer`] -- AMSGrad state management
//! - [`utils`] -- numerical stability (log-space volume, stable sigmoid, Gumbel operations)
//!
//! ## Backends (feature-gated)
//!
//! - [`ndarray_backend`] -- `NdarrayBox`, `NdarrayGumbelBox`, distance functions
//! (feature = `ndarray-backend`, **on by default**)
//! - `candle_backend` -- `CandleBox`, `CandleGumbelBox` with GPU support
//! (feature = `candle-backend`)
//!
//! # Feature Flags
//!
//! | Feature | Default | Provides |
//! |---------|---------|----------|
//! | `ndarray-backend` | yes | [`ndarray_backend`] module (also enables `rand`) |
//! | `candle-backend` | no | `candle_backend` module (GPU via candle) |
//! | `rand` | yes (via `ndarray-backend`) | Negative sampling utilities in [`trainer`] |
//! | `hyperbolic` | no | `hyperbolic` module (Poincare ball via `hyperball` + `skel`) |
//! | `petgraph` | no | `petgraph_adapter` module (convert petgraph graphs to datasets) |
//!
//! # Example
//!
//! ```rust,ignore
//! // Rename to avoid shadowing std::boxed::Box
//! use subsume::Box as BoxRegion;
//!
//! // Framework-agnostic: works with NdarrayBox, CandleBox, or your own impl
//! fn compute_entailment<B: BoxRegion>(
//! premise: &B,
//! hypothesis: &B,
//! ) -> Result<B::Scalar, subsume::BoxError> {
//! premise.containment_prob(hypothesis)
//! }
//! ```
//!
//! # References
//!
//! - Vilnis et al. (2018), "Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures"
//! - Abboud et al. (2020), "BoxE: A Box Embedding Model for Knowledge Base Completion"
//! - Li et al. (2019), "Smoothing the Geometry of Probabilistic Box Embeddings" (ICLR 2019)
//! - Dasgupta et al. (2020), "Improving Local Identifiability in Probabilistic Box Embeddings"
//! - Chen et al. (2021), "Uncertainty-Aware Knowledge Graph Embeddings" (UKGE)
//! - Lee et al. (2022), "Box Embeddings for Event-Event Relation Extraction" (BERE)
//! - Cao et al. (2024, ACM Computing Surveys), "KG Embedding: A Survey from the
//! Perspective of Representation Spaces" -- positions box/cone/octagon embeddings
//! within the broader KGE taxonomy (Euclidean, hyperbolic, complex, geometric)
//! - Bourgaux et al. (2024, KR), "Knowledge Base Embeddings: Semantics and Theoretical Properties"
//! - Lacerda et al. (2024, TGDK), "Strong Faithfulness for ELH Ontology Embeddings"
//! - Yang & Chen (2025), "RegD: Achieving Hyperbolic-Like Expressiveness with Arbitrary
//! Euclidean Regions" -- source of the depth/boundary dissimilarity metrics in [`distance`]
// ---------------------------------------------------------------------------
// Core traits and geometry
// ---------------------------------------------------------------------------
/// Core [`Box`] trait: containment probability, overlap, volume, and intersection.
/// Cone embeddings: angular containment on the unit sphere, with negation support.
/// Octagon embeddings: axis-aligned polytopes with diagonal constraints (IJCAI 2024).
/// Knowledge graph dataset loading (WN18RR, FB15k-237, YAGO3-10, and similar formats).
/// Distance metrics: Query2Box distance scoring.
/// Poincare ball embeddings for tree-like hierarchical structures.
///
/// Requires the `hyperbolic` feature (uses `ndarray::ArrayView1` for
/// interoperability with the `hyperball` and `skel` crates).
/// AMSGrad optimizer state and learning rate utilities.
/// Sheaf neural networks: algebraic consistency enforcement on graphs.
/// Learnable box and cone representations with gradient-compatible parameters.
/// Training loop utilities: negative sampling, loss kernels, link prediction evaluation.
/// Rank-based evaluation metrics (MRR, Hits@k, Mean Rank).
/// Re-export rankops for rank fusion, IR evaluation (nDCG, MAP), and reranking.
pub use rankops;
/// Numerical stability: log-space volume, stable sigmoid, Gumbel operations.
/// Diagonal Gaussian box embeddings for taxonomy expansion (TaxoBell).
/// EL++ ontology embedding primitives (Box2EL / TransBox).
/// EL++ normalized axiom dataset loader (GALEN, GO, Anatomy formats).
/// EL++ ontology embedding primitives for cones (angular containment).
/// Composable cone query operators for first-order logical query answering.
/// EL++ ontology embedding training: axiom parsing, training loop, evaluation.
/// Fuzzy set-theoretic operators: t-norms, t-conorms, and negation (FuzzQE).
/// Taxonomy dataset loading for the TaxoBell format (`.terms` / `.taxo` / `dic.json`).
/// TaxoBell combined training loss for taxonomy expansion.
/// TaxoBell MLP encoder and training loop with candle autograd.
///
/// Requires the `candle-backend` feature.
// ---------------------------------------------------------------------------
// Re-exports: primary traits and types
// ---------------------------------------------------------------------------
/// The core box embedding trait. Start here.
///
/// This trait shares its name with [`std::boxed::Box`]. To avoid shadowing, use one of:
/// - `use subsume::Box as BoxRegion;` (recommended)
/// - Qualify calls as `subsume::Box` or `<T as subsume::Box>::method()`
pub use ;
/// Convenience alias for the [`Box`] trait that avoids shadowing [`std::boxed::Box`].
///
/// `use subsume::BoxRegion;` is equivalent to `use subsume::Box as BoxRegion;`.
pub use Box as BoxRegion;
// Re-exports: geometry errors
pub use ConeError;
pub use ;
pub use OctagonError;
pub use SheafError;
// Re-exports: data loading
pub use ;
// Re-exports: training
pub use ;
// Re-export: CandleBoxTrainer (GPU training)
pub use CandleBoxTrainer;
// Re-export: ndarray (public dependency -- appears in NdarrayBox/NdarrayGumbelBox/NdarrayCone API)
pub use ndarray;
// Re-exports: evaluation metrics
pub use ;
// Re-exports: Gaussian boxes
pub use GaussianBox;
// Re-exports: EL++ training
pub use ;
// Re-exports: cone EL++ primitives
pub use ;
// Re-exports: cone query operators
pub use ;
// ---------------------------------------------------------------------------
// Feature-gated backends
// ---------------------------------------------------------------------------
/// Ndarray backend: `NdarrayBox`, `NdarrayGumbelBox`, optimizer, and learning rate scheduler.
///
/// This is the default backend. Enable with `features = ["ndarray-backend"]`.
/// Candle backend: `CandleBox`, `CandleGumbelBox` with GPU acceleration.
///
/// Provides box and Gumbel box operations. Cone, octagon, and Gaussian
/// geometries are available through the ndarray backend only.
///
/// Enable with `features = ["candle-backend"]`.
/// Adapter for constructing datasets from [`petgraph`] graphs.
///
/// Requires the `petgraph` feature.
/// Bridge from [`lattix`] knowledge graphs to subsume datasets.
///
/// Converts lattix KGs (loaded from N-Triples, Turtle, CSV, JSON-LD)
/// into subsume datasets for training. Requires the `lattix` feature.