Core types for knowledge graphs.
This crate provides foundational types for working with knowledge graphs:
- [
Triple] - A (subject, predicate, object) triple - [
Entity] - A node in the knowledge graph - [
Relation] - An edge type in the knowledge graph - [
KnowledgeGraph] - A homogeneous graph structure built from triples - [
HeteroGraph] - A heterogeneous graph with typed nodes and edges
Historical Context: From Databases to Graphs
| Era | Representation | Example | Limitation |
|---|---|---|---|
| 1970s | Relational | SQL tables | Fixed schema, join-heavy |
| 2001 | RDF | Semantic Web | XML verbosity, query complexity |
| 2012 | Property Graphs | Neo4j | No standard query language |
| 2019 | GNN-ready graphs | PyG, DGL | Framework-specific formats |
Knowledge graphs became essential when search engines realized that "things, not strings" (Google, 2012) captures real-world semantics that keyword matching misses.
The Triple: Atomic Unit of Knowledge
All knowledge graph formalisms reduce to the triple:
(subject, predicate, object)
(Albert Einstein, born_in, Ulm)
(Ulm, located_in, Germany)
This simple structure enables:
- Inference: If A → B → C, deduce A → C
- Composition: Merge graphs by merging shared entities
- Embedding: Represent triples as vectors for ML
Beyond Triples: N-ary Relations and Hypergraphs
Triples are limited - they can only express binary relations. Real-world facts often involve more than two entities:
(Einstein, won, Nobel Prize, Physics, 1921) -- 4 entities!
(Alice, purchased, Book, $20, Amazon, 2024-01-15) -- 6 entities!
Three approaches handle this:
| Approach | Structure | Example | Trade-off |
|---|---|---|---|
| Reification | Break into multiple triples | Creates artificial nodes | Information loss |
| Qualifiers | Triple + key-value pairs | Wikidata model | Complex querying |
| Hyperedges | N-ary relation directly | Native structure | New embeddings needed |
Reification (Workaround)
Convert n-ary facts to binary by introducing intermediate nodes:
Original: (Einstein, won, Nobel, Physics, 1921)
Reified: (Award_1, recipient, Einstein)
(Award_1, prize, Nobel)
(Award_1, field, Physics)
(Award_1, year, 1921)
Problem: Loses the atomic nature of the fact. Embedding models struggle because Award_1 is artificial - it has no semantic meaning.
Hyper-relational KGs (Wikidata Style)
Attach qualifiers to triples:
(Einstein, won, Nobel Prize)
qualifiers: {field: Physics, year: 1921}
Implemented in [hyper::HyperTriple]. Embeddings: StarE, HINGE.
Knowledge Hypergraphs (Native N-ary)
Represent facts as hyperedges connecting multiple entities with roles:
HyperEdge {
relation: "award_ceremony",
bindings: {
recipient: Einstein,
prize: Nobel,
field: Physics,
year: 1921
}
}
Implemented in [hyper::HyperEdge]. Embeddings: HSimplE, HypE, HyCubE.
Key insight: Position-aware or role-aware encoding is essential. The relation "recipient" carries different semantics than "year".
See [hyper] module for hypergraph types.
Homogeneous vs Heterogeneous Graphs
| Type | Nodes | Edges | Use Case |
|---|---|---|---|
| Homogeneous | One type | One type | Citation networks, social graphs |
| Heterogeneous | Multiple types | Multiple types | Knowledge graphs, biomedical |
[HeteroGraph] supports typed nodes and edges, essential for:
- RGCN: Relation-specific weight matrices
- HGT: Heterogeneous graph transformers
- Link prediction: Typed edge prediction
Serialization Formats
Supports modern RDF 1.2 specifications (2024):
- N-Triples - Line-based, simple (fastest parsing)
- N-Quads - N-Triples with named graphs
- Turtle - Human-readable (best for debugging)
- JSON-LD - Linked data (web integration)
Algorithms
Centrality ([algo::centrality])
| Algorithm | Question | Module |
|---|---|---|
| Degree | How many connections? | [algo::centrality::degree_centrality] |
| Betweenness | Bridge between communities? | [algo::centrality::betweenness_centrality] |
| Closeness | How close to everyone? | [algo::centrality::closeness_centrality] |
| Eigenvector | Connected to important nodes? | [algo::centrality::eigenvector_centrality] |
| Katz | Reachable via damped paths? | [algo::centrality::katz_centrality] |
| PageRank | Random walk equilibrium? | [algo::pagerank::pagerank] |
| HITS | Hub or authority? | [algo::centrality::hits] |
Other Algorithms
- [
algo::random_walk] - Node2Vec style random walks (biased BFS/DFS) - [
algo::components] - Connected components (graph structure) - [
algo::sampling] - Neighbor sampling for mini-batch GNN training
When to Use Which Structure
| Task | Structure | Why |
|---|---|---|
| Node classification | KnowledgeGraph | Homogeneous GCN/GAT |
| Link prediction | HeteroGraph | Relation types matter |
| Knowledge completion | HeteroGraph + embeddings | TransE, RotatE, BoxE |
| Graph classification | KnowledgeGraph | Global pooling over nodes |
Example
use ;
let mut kg = new;
// Add triples
kg.add_triple;
kg.add_triple;
kg.add_triple;
// Query
let apple_relations = kg.relations_from;
assert_eq!;