subsume 0.1.3

Geometric box embeddings: containment, entailment, overlap. Ndarray and Candle backends.
Documentation

subsume

crates.io Documentation CI

Geometric box embeddings: containment, entailment, overlap. Ndarray and Candle backends.

Box embedding concepts

What it provides

Component What it does
Box trait Framework-agnostic axis-aligned hyperrectangle: volume, containment, overlap, distance
GumbelBox trait Probabilistic boxes via Gumbel random variables (dense gradients, no flat regions)
NdarrayBox / NdarrayGumbelBox CPU backend using ndarray::Array1<f32>
CandleBox / CandleGumbelBox GPU/Metal backend using candle_core::Tensor
Training utilities Negative sampling, volume regularization, temperature scheduling, AMSGrad
Evaluation Mean rank, MRR, Hits@k, NDCG, calibration, reliability diagrams
Sheaf networks Sheaf neural networks for transitivity (Hansen & Ghrist 2019)
Hyperbolic boxes Box embeddings in Poincare ball (via hyperball)

Usage

[dependencies]
subsume = { version = "0.1.2", features = ["ndarray-backend"] }
ndarray = "0.16"
use subsume::ndarray_backend::NdarrayBox;
use subsume::Box as BoxTrait;
use ndarray::array;

// Box A: [0,0,0] to [1,1,1] (general concept)
let premise = NdarrayBox::new(array![0., 0., 0.], array![1., 1., 1.], 1.0)?;

// Box B: [0.2,0.2,0.2] to [0.8,0.8,0.8] (specific, inside A)
let hypothesis = NdarrayBox::new(array![0.2, 0.2, 0.2], array![0.8, 0.8, 0.8], 1.0)?;

// Containment probability: P(B inside A)
let p = premise.containment_prob(&hypothesis, 1.0)?;
assert!(p > 0.9);

Examples

cargo run -p subsume --example containment_hierarchy    # taxonomic is-a relationships with nested boxes
cargo run -p subsume --example gumbel_box_exploration   # Gumbel boxes, soft containment, temperature effects
cargo run -p subsume --example cone_training            # training cone embeddings on a taxonomy
cargo run -p subsume --example box_training             # training box embeddings on a 25-entity taxonomy

See examples/README.md for a guide to choosing the right example.

Tests

cargo test -p subsume

380+ unit tests + doc tests covering box operations (intersection, union, containment, overlap, distance, truncation), Gumbel box membership and temperature edge cases, serialization round-trips, training metrics (MRR, Hits@k, NDCG), calibration diagnostics, negative sampling, sheaf networks, hyperbolic geometry, quasimetric properties, and more.

Why Gumbel boxes?

Gumbel noise robustness

Gumbel boxes model coordinates as Gumbel random variables, creating soft boundaries that provide dense gradients throughout training. Hard boxes create flat regions where gradients vanish; Gumbel boxes solve this local identifiability problem (Dasgupta et al., 2020). As shown above, this also makes containment robust to coordinate noise -- Gumbel containment loss stays near zero even at high perturbation levels where Gaussian boxes fail completely.

Training convergence

Training convergence

Box embeddings learning a 25-entity containment hierarchy over 200 epochs. Run cargo run --example box_training to reproduce, or uv run scripts/plot_training.py to regenerate the plot.

References

  • Vilnis et al. (2018). "Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures"
  • Dasgupta et al. (2020). "Improving Local Identifiability in Probabilistic Box Embeddings"
  • Ren et al. (2020). "Query2Box: Reasoning over Knowledge Graphs using Box Embeddings"

See also

  • innr -- SIMD-accelerated vector similarity primitives
  • kuji -- stochastic sampling (Gumbel-max uses the same distribution)
  • anno -- information extraction with optional box-embedding coreference

License

MIT OR Apache-2.0