subsume 0.1.5

Geometric box embeddings: containment, entailment, overlap. Ndarray and Candle backends.
Documentation

subsume

crates.io Documentation CI

Geometric box embeddings: containment, entailment, overlap. Ndarray and Candle backends.

Box embedding concepts

(a) Containment: nested boxes encode taxonomic is-a relationships. (b) Gumbel soft boundary: temperature controls membership sharpness. (c) Octagon: diagonal constraints cut corners, 50% tighter than the bounding box.

What it provides

Component What it does
Box trait Framework-agnostic axis-aligned hyperrectangle: volume, containment, overlap, distance
GumbelBox trait Probabilistic boxes via Gumbel random variables (dense gradients, no flat regions)
Gumbel operations Softplus Bessel volume, LSE intersection (Dasgupta et al., 2020)
BoxE scoring Point-entity BoxE model (Abboud et al., 2020) + box-to-box variant
NdarrayBox / NdarrayGumbelBox CPU backend using ndarray::Array1<f32>
CandleBox / CandleGumbelBox GPU/Metal backend using candle_core::Tensor
Training utilities Negative sampling, volume regularization, temperature scheduling, AMSGrad
Evaluation Mean rank, MRR, Hits@k, NDCG, calibration, reliability diagrams
Sheaf networks Sheaf neural networks for transitivity (Hansen & Ghrist 2019)
Hyperbolic boxes Box embeddings in Poincare ball (via hyperball)
gaussian Diagonal Gaussian box embeddings: KL divergence (asymmetric containment) and Bhattacharyya coefficient (symmetric overlap)
el EL++ ontology embedding primitives: inclusion loss, role translation/composition, existential boxes, disjointness (Box2EL/TransBox)
taxonomy TaxoBell-format taxonomy dataset loader: .terms/.taxo parsing, train/val/test splitting, conversion to Triples
taxobell TaxoBell combined training loss: symmetric (Bhattacharyya triplet), asymmetric (KL containment), volume regularization, sigma clipping
octagon Octagon embeddings: axis-aligned polytopes with diagonal constraints (Charpenay & Schockaert, IJCAI 2024)
fuzzy Fuzzy t-norms/t-conorms for logical query answering (FuzzQE, Chen et al., AAAI 2022)
query2box_distance Alpha-weighted distance scoring for query answering (Ren et al., NeurIPS 2020)

Usage

[dependencies]
subsume = { version = "0.1.4", features = ["ndarray-backend"] }
ndarray = "0.16"
use subsume::ndarray_backend::NdarrayBox;
use subsume::Box as BoxTrait;
use ndarray::array;

// Box A: [0,0,0] to [1,1,1] (general concept)
let premise = NdarrayBox::new(array![0., 0., 0.], array![1., 1., 1.], 1.0)?;

// Box B: [0.2,0.2,0.2] to [0.8,0.8,0.8] (specific, inside A)
let hypothesis = NdarrayBox::new(array![0.2, 0.2, 0.2], array![0.8, 0.8, 0.8], 1.0)?;

// Containment probability: P(B inside A)
let p = premise.containment_prob(&hypothesis, 1.0)?;
assert!(p > 0.9);

Examples

cargo run -p subsume --example containment_hierarchy    # taxonomic is-a relationships with nested boxes
cargo run -p subsume --example gumbel_box_exploration   # Gumbel boxes, soft containment, temperature effects
cargo run -p subsume --example cone_training            # training cone embeddings on a taxonomy
cargo run -p subsume --example box_training             # training box embeddings on a 25-entity taxonomy
cargo run -p subsume --example taxobell_demo            # TaxoBell Gaussian box losses on a mini taxonomy
cargo run -p subsume --example query2box                # Query2Box: multi-hop queries, box intersection, distance scoring
cargo run -p subsume --example octagon_demo             # octagon embeddings: diagonal constraints, containment, volume
cargo run -p subsume --example fuzzy_query              # fuzzy query answering: t-norms, De Morgan duality, rankings

See examples/README.md for a guide to choosing the right example.

Tests

cargo test -p subsume

634 tests (unit + property + doc) covering:

  • Box geometry: intersection, union, containment, overlap, distance, volume, truncation
  • Gumbel boxes: membership probability, temperature edge cases, Bessel volume
  • Training: MRR, Hits@k, NDCG, calibration, negative sampling, AMSGrad
  • Octagon: intersection closure, containment, Sutherland-Hodgman volume
  • Fuzzy: t-norm/t-conorm commutativity, associativity, De Morgan duality
  • Gaussian boxes, EL++ ontology losses, sheaf networks, hyperbolic geometry, quasimetrics

Why Gumbel boxes?

Gumbel noise robustness

Gumbel boxes model coordinates as Gumbel random variables, creating soft boundaries that provide dense gradients throughout training. Hard boxes create flat regions where gradients vanish; Gumbel boxes solve this local identifiability problem (Dasgupta et al., 2020). As shown above, this also makes containment robust to coordinate noise -- Gumbel containment loss stays near zero even at high perturbation levels where Gaussian boxes fail completely.

Training convergence

Training convergence

25-entity taxonomy learned over 200 epochs. Left: total violation drops 3 orders of magnitude. Right: containment probabilities converge to 1.0 at different rates depending on hierarchy depth. Reproduce: cargo run --example box_training or uv run scripts/plot_training.py.

References

  • Vilnis et al. (2018). "Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures"
  • Abboud et al. (2020). "BoxE: A Box Embedding Model for Knowledge Base Completion"
  • Dasgupta et al. (2020). "Improving Local Identifiability in Probabilistic Box Embeddings"
  • Ren et al. (2020). "Query2Box: Reasoning over Knowledge Graphs using Box Embeddings"
  • Chen et al. (2022). "Fuzzy Logic Based Logical Query Answering on Knowledge Graphs"
  • Charpenay & Schockaert (2024). "Capturing Knowledge Graphs and Rules with Octagon Embeddings"

See also

  • innr -- SIMD-accelerated vector similarity primitives
  • kuji -- stochastic sampling (Gumbel-max uses the same distribution)
  • anno -- information extraction with optional box-embedding coreference

License

MIT OR Apache-2.0