pub struct GraphEngine { /* private fields */ }Expand description
In-memory graph engine backed by petgraph, synced to SQLite via codemem-storage.
§Design: intentional in-memory architecture
All graph data (nodes, edges, adjacency) is held entirely in memory using
HashMap-based structures. This is deliberate: graph traversals, centrality
algorithms, and multi-hop expansions benefit enormously from avoiding disk
I/O on every edge follow. The trade-off is higher memory usage, which is
acceptable for the typical code-graph sizes this engine targets.
§Memory characteristics
nodes:HashMap<String, GraphNode>— ~200 bytes per node (ID, kind, label, namespace, metadata, centrality).edges:HashMap<String, Edge>— ~150 bytes per edge (ID, src, dst, relationship, weight, properties, timestamps).edge_adj:HashMap<String, Vec<String>>— adjacency index mapping node IDs to incident edge IDs for O(degree) lookups.id_to_index: maps string IDs to petgraphNodeIndexvalues.cached_pagerank/cached_betweenness: centrality caches populated byrecompute_centrality().
Use CodememEngine::graph_memory_estimate() for a
byte-level sizing estimate based on current node and edge counts.
§Thread safety
GraphEngine is not Sync — it stores mutable graph state without
internal locking. Callers in codemem-engine wrap it in Mutex<GraphEngine>
(via lock_graph()) to ensure exclusive access. All public &mut self
methods (e.g., add_node, recompute_centrality) require the caller to
hold the lock.
Implementations§
Source§impl GraphEngine
impl GraphEngine
Sourcepub fn pagerank(
&self,
damping: f64,
iterations: usize,
tolerance: f64,
) -> HashMap<String, f64>
pub fn pagerank( &self, damping: f64, iterations: usize, tolerance: f64, ) -> HashMap<String, f64>
Compute PageRank scores for all nodes using power iteration.
damping: probability of following an edge (default 0.85)iterations: max number of power iterations (default 100)tolerance: convergence threshold (default 1e-6)
Returns a map from node ID to PageRank score.
Sourcepub fn louvain_communities(&self, resolution: f64) -> Vec<Vec<String>>
pub fn louvain_communities(&self, resolution: f64) -> Vec<Vec<String>>
Detect communities using the Louvain algorithm.
Treats the directed graph as undirected for modularity computation.
resolution controls community granularity (1.0 = standard modularity).
Returns groups of node IDs, one group per community.
Sourcepub fn betweenness_centrality(&self) -> HashMap<String, f64>
pub fn betweenness_centrality(&self) -> HashMap<String, f64>
Compute betweenness centrality for all nodes using Brandes’ algorithm.
For graphs with more than 1000 nodes, samples sqrt(n) source nodes for approximate computation.
Returns a map from node ID to betweenness centrality score (normalized by 1/((n-1)(n-2)) for directed graphs).
Sourcepub fn topological_layers(&self) -> Vec<Vec<String>>
pub fn topological_layers(&self) -> Vec<Vec<String>>
Compute topological layers using Kahn’s algorithm.
Returns layers where all nodes in layer i have no dependencies on nodes in layer i or later. For cyclic graphs, SCCs are condensed into single super-nodes first, then the resulting DAG is topologically sorted.
Each inner Vec contains the node IDs at that layer.
Source§impl GraphEngine
impl GraphEngine
Sourcepub fn from_storage(storage: &dyn StorageBackend) -> Result<Self, CodememError>
pub fn from_storage(storage: &dyn StorageBackend) -> Result<Self, CodememError>
Load graph from storage.
Sourcepub fn node_count(&self) -> usize
pub fn node_count(&self) -> usize
Get the number of nodes.
Sourcepub fn edge_count(&self) -> usize
pub fn edge_count(&self) -> usize
Get the number of edges.
Sourcepub fn expand(
&self,
start_ids: &[String],
max_hops: usize,
) -> Result<Vec<GraphNode>, CodememError>
pub fn expand( &self, start_ids: &[String], max_hops: usize, ) -> Result<Vec<GraphNode>, CodememError>
Multi-hop expansion: given a set of node IDs, expand N hops to find related nodes.
Sourcepub fn neighbors(&self, node_id: &str) -> Result<Vec<GraphNode>, CodememError>
pub fn neighbors(&self, node_id: &str) -> Result<Vec<GraphNode>, CodememError>
Get neighbors of a node (1-hop).
Sourcepub fn connected_components(&self) -> Vec<Vec<String>>
pub fn connected_components(&self) -> Vec<Vec<String>>
Return groups of connected node IDs.
Treats the directed graph as undirected: two nodes are in the same
component if there is a path between them in either direction.
Each inner Vec<String> is one connected component.
Sourcepub fn compute_centrality(&mut self)
pub fn compute_centrality(&mut self)
Compute degree centrality for every node and update their centrality field.
Degree centrality for node v is defined as:
(in_degree(v) + out_degree(v)) / (N - 1)
where N is the total number of nodes. When N <= 1, centrality is 0.
Sourcepub fn get_all_nodes(&self) -> Vec<GraphNode>
pub fn get_all_nodes(&self) -> Vec<GraphNode>
Return all nodes currently in the graph.
Sourcepub fn get_node_ref(&self, id: &str) -> Option<&GraphNode>
pub fn get_node_ref(&self, id: &str) -> Option<&GraphNode>
Return a reference to a node without cloning. Returns None if not found.
Sourcepub fn get_edges_ref(&self, node_id: &str) -> Vec<&Edge>
pub fn get_edges_ref(&self, node_id: &str) -> Vec<&Edge>
Return references to edges incident on a node without cloning.
This is the zero-copy variant of GraphBackend::get_edges() — same
lookup logic via edge_adj, but returns &Edge instead of owned Edge.
Sourcepub fn recompute_centrality(&mut self)
pub fn recompute_centrality(&mut self)
Recompute and cache PageRank and betweenness centrality scores.
This should be called after loading the graph (e.g., on server start) and periodically when the graph changes significantly.
Sourcepub fn recompute_centrality_with_options(&mut self, include_betweenness: bool)
pub fn recompute_centrality_with_options(&mut self, include_betweenness: bool)
Recompute centrality caches with control over which algorithms run.
PageRank is always computed. Betweenness centrality is only computed
when include_betweenness is true, since it is O(V * E) and can be
expensive on large graphs.
Sourcepub fn ensure_betweenness_computed(&mut self)
pub fn ensure_betweenness_computed(&mut self)
Lazily ensure betweenness centrality has been computed.
If cached_betweenness is empty (e.g., after recompute_centrality_with_options(false)),
this method computes and caches betweenness centrality on demand. If the
cache is already populated, this is a no-op.
Sourcepub fn get_pagerank(&self, node_id: &str) -> f64
pub fn get_pagerank(&self, node_id: &str) -> f64
Get the cached PageRank score for a node. Returns 0.0 if not found.
Sourcepub fn get_betweenness(&self, node_id: &str) -> f64
pub fn get_betweenness(&self, node_id: &str) -> f64
Get the cached betweenness centrality score for a node. Returns 0.0 if not found.
Sourcepub fn raw_graph_metrics_for_memory(
&self,
memory_id: &str,
) -> Option<RawGraphMetrics>
pub fn raw_graph_metrics_for_memory( &self, memory_id: &str, ) -> Option<RawGraphMetrics>
Collect raw graph metrics for a memory node from both code-graph and memory-graph neighbors.
Code neighbors (sym:, file:, chunk:, pkg:) contribute centrality
data (PageRank, betweenness). Memory neighbors (UUID-based) contribute
connectivity data (count + edge weights). A function with many linked
memories is more important; a memory with many linked memories has richer
context.
Returns None if the memory node is not in the graph or has no neighbors.