lattix 0.1.1

Graph/KG substrate: triples + PageRank + random walks
Documentation

Core types for knowledge graphs.

This crate provides foundational types for working with knowledge graphs:

  • [Triple] - A (subject, predicate, object) triple
  • [Entity] - A node in the knowledge graph
  • [Relation] - An edge type in the knowledge graph
  • [KnowledgeGraph] - A homogeneous graph structure built from triples
  • [HeteroGraph] - A heterogeneous graph with typed nodes and edges

Historical Context: From Databases to Graphs

Era Representation Example Limitation
1970s Relational SQL tables Fixed schema, join-heavy
2001 RDF Semantic Web XML verbosity, query complexity
2012 Property Graphs Neo4j No standard query language
2019 GNN-ready graphs PyG, DGL Framework-specific formats

Knowledge graphs became essential when search engines realized that "things, not strings" (Google, 2012) captures real-world semantics that keyword matching misses.

The Triple: Atomic Unit of Knowledge

All knowledge graph formalisms reduce to the triple:

(subject, predicate, object)
(Albert Einstein, born_in, Ulm)
(Ulm, located_in, Germany)

This simple structure enables:

  • Inference: If A → B → C, deduce A → C
  • Composition: Merge graphs by merging shared entities
  • Embedding: Represent triples as vectors for ML

Beyond Triples: N-ary Relations and Hypergraphs

Triples are limited - they can only express binary relations. Real-world facts often involve more than two entities:

(Einstein, won, Nobel Prize, Physics, 1921)  -- 4 entities!
(Alice, purchased, Book, $20, Amazon, 2024-01-15)  -- 6 entities!

Three approaches handle this:

Approach Structure Example Trade-off
Reification Break into multiple triples Creates artificial nodes Information loss
Qualifiers Triple + key-value pairs Wikidata model Complex querying
Hyperedges N-ary relation directly Native structure New embeddings needed

Reification (Workaround)

Convert n-ary facts to binary by introducing intermediate nodes:

Original: (Einstein, won, Nobel, Physics, 1921)
Reified:  (Award_1, recipient, Einstein)
          (Award_1, prize, Nobel)
          (Award_1, field, Physics)
          (Award_1, year, 1921)

Problem: Loses the atomic nature of the fact. Embedding models struggle because Award_1 is artificial - it has no semantic meaning.

Hyper-relational KGs (Wikidata Style)

Attach qualifiers to triples:

(Einstein, won, Nobel Prize)
  qualifiers: {field: Physics, year: 1921}

Implemented in [hyper::HyperTriple]. Embeddings: StarE, HINGE.

Knowledge Hypergraphs (Native N-ary)

Represent facts as hyperedges connecting multiple entities with roles:

HyperEdge {
  relation: "award_ceremony",
  bindings: {
    recipient: Einstein,
    prize: Nobel,
    field: Physics,
    year: 1921
  }
}

Implemented in [hyper::HyperEdge]. Embeddings: HSimplE, HypE, HyCubE.

Key insight: Position-aware or role-aware encoding is essential. The relation "recipient" carries different semantics than "year".

See [hyper] module for hypergraph types.

Homogeneous vs Heterogeneous Graphs

Type Nodes Edges Use Case
Homogeneous One type One type Citation networks, social graphs
Heterogeneous Multiple types Multiple types Knowledge graphs, biomedical

[HeteroGraph] supports typed nodes and edges, essential for:

  • RGCN: Relation-specific weight matrices
  • HGT: Heterogeneous graph transformers
  • Link prediction: Typed edge prediction

Serialization Formats

Supports modern RDF 1.2 specifications (2024):

  • N-Triples - Line-based, simple (fastest parsing)
  • N-Quads - N-Triples with named graphs
  • Turtle - Human-readable (best for debugging)
  • JSON-LD - Linked data (web integration)

Algorithms

Centrality ([algo::centrality])

Algorithm Question Module
Degree How many connections? [algo::centrality::degree_centrality]
Betweenness Bridge between communities? [algo::centrality::betweenness_centrality]
Closeness How close to everyone? [algo::centrality::closeness_centrality]
Eigenvector Connected to important nodes? [algo::centrality::eigenvector_centrality]
Katz Reachable via damped paths? [algo::centrality::katz_centrality]
PageRank Random walk equilibrium? [algo::pagerank::pagerank]
HITS Hub or authority? [algo::centrality::hits]

Other Algorithms

  • [algo::random_walk] - Node2Vec style random walks (biased BFS/DFS)
  • [algo::components] - Connected components (graph structure)
  • [algo::sampling] - Neighbor sampling for mini-batch GNN training

When to Use Which Structure

Task Structure Why
Node classification KnowledgeGraph Homogeneous GCN/GAT
Link prediction HeteroGraph Relation types matter
Knowledge completion HeteroGraph + embeddings TransE, RotatE, BoxE
Graph classification KnowledgeGraph Global pooling over nodes

Example

use lattix::{Triple, KnowledgeGraph};

let mut kg = KnowledgeGraph::new();

// Add triples
kg.add_triple(Triple::new("Apple", "founded_by", "Steve Jobs"));
kg.add_triple(Triple::new("Apple", "headquartered_in", "Cupertino"));
kg.add_triple(Triple::new("Steve Jobs", "born_in", "San Francisco"));

// Query
let apple_relations = kg.relations_from("Apple");
assert_eq!(apple_relations.len(), 2);