Skip to main content

GraphEngine

Struct GraphEngine 

Source
pub struct GraphEngine { /* private fields */ }
Expand description

In-memory graph engine backed by petgraph, synced to SQLite via codemem-storage.

§Design: intentional in-memory architecture

All graph data (nodes, edges, adjacency) is held entirely in memory using HashMap-based structures. This is deliberate: graph traversals, centrality algorithms, and multi-hop expansions benefit enormously from avoiding disk I/O on every edge follow. The trade-off is higher memory usage, which is acceptable for the typical code-graph sizes this engine targets.

§Memory characteristics

  • nodes: HashMap<String, GraphNode> — ~200 bytes per node (ID, kind, label, namespace, metadata, centrality).
  • edges: HashMap<String, Edge> — ~150 bytes per edge (ID, src, dst, relationship, weight, properties, timestamps).
  • edge_adj: HashMap<String, Vec<String>> — adjacency index mapping node IDs to incident edge IDs for O(degree) lookups.
  • id_to_index: maps string IDs to petgraph NodeIndex values.
  • cached_pagerank / cached_betweenness: centrality caches populated by recompute_centrality().

Use CodememEngine::graph_memory_estimate() for a byte-level sizing estimate based on current node and edge counts.

§Thread safety

GraphEngine is not Sync — it stores mutable graph state without internal locking. Callers in codemem-engine wrap it in Mutex<GraphEngine> (via lock_graph()) to ensure exclusive access. All public &mut self methods (e.g., add_node, recompute_centrality) require the caller to hold the lock.

Implementations§

Source§

impl GraphEngine

Source

pub fn pagerank_for_namespace( &self, namespace: &str, damping: f64, iterations: usize, tolerance: f64, ) -> HashMap<String, f64>

Compute PageRank scores for nodes in a single namespace using power iteration.

Only nodes belonging to namespace participate. Edges that cross into other namespaces are ignored so that unrelated projects cannot inflate or deflate centrality scores within this namespace.

Returns a map from node ID to PageRank score (only for nodes in the namespace).

Source

pub fn pagerank( &self, damping: f64, iterations: usize, tolerance: f64, ) -> HashMap<String, f64>

Compute PageRank scores for all nodes using power iteration.

  • damping: probability of following an edge (default 0.85)
  • iterations: max number of power iterations (default 100)
  • tolerance: convergence threshold (default 1e-6)

Returns a map from node ID to PageRank score.

Source

pub fn louvain_communities(&self, resolution: f64) -> Vec<Vec<String>>

Detect communities using the Louvain algorithm.

Treats the directed graph as undirected for modularity computation. resolution controls community granularity (1.0 = standard modularity). Returns groups of node IDs, one group per community.

Source

pub fn betweenness_centrality(&self) -> HashMap<String, f64>

Compute betweenness centrality for all nodes using Brandes’ algorithm.

For graphs with more than 1000 nodes, samples sqrt(n) source nodes for approximate computation.

Returns a map from node ID to betweenness centrality score (normalized by 1/((n-1)(n-2)) for directed graphs).

Source

pub fn strongly_connected_components(&self) -> Vec<Vec<String>>

Find all strongly connected components using Tarjan’s algorithm.

Returns groups of node IDs. Each group is a strongly connected component where every node can reach every other node via directed edges.

Source

pub fn topological_layers(&self) -> Vec<Vec<String>>

Compute topological layers using Kahn’s algorithm.

Returns layers where all nodes in layer i have no dependencies on nodes in layer i or later. For cyclic graphs, SCCs are condensed into single super-nodes first, then the resulting DAG is topologically sorted.

Each inner Vec contains the node IDs at that layer.

Source

pub fn subgraph_top_n( &self, n: usize, namespace: Option<&str>, kinds: Option<&[NodeKind]>, ) -> (Vec<GraphNode>, Vec<Edge>)

Return top-N nodes by centrality and edges between them. Optionally filter by namespace and/or node kinds.

Non-structural edges (CALLS, IMPORTS, INHERITS, IMPLEMENTS, DEPENDS_ON) from top-N nodes pull their targets into the result so that these relationship types are visible in the UI graph.

Source

pub fn label_community(&self, member_ids: &[&str]) -> String

Label a community by the most common parent directories of its member nodes.

Takes a list of node IDs, looks up their file_path payloads, extracts parent directory names (second-to-last path component), and returns a label. If all members share the same directory, returns that name. If mixed, combines the two most frequent directories with +. Returns "unknown" if no file paths are found.

Source

pub fn louvain_with_assignment(&self, resolution: f64) -> HashMap<String, usize>

Return node-to-community-ID mapping for Louvain.

Source§

impl GraphEngine

Source

pub fn new() -> GraphEngine

Create a new empty graph.

Source

pub fn from_storage( storage: &dyn StorageBackend, ) -> Result<GraphEngine, CodememError>

Load graph from storage.

Source

pub fn node_count(&self) -> usize

Get the number of nodes.

Source

pub fn edge_count(&self) -> usize

Get the number of edges.

Source

pub fn expand( &self, start_ids: &[String], max_hops: usize, ) -> Result<Vec<GraphNode>, CodememError>

Multi-hop expansion: given a set of node IDs, expand N hops to find related nodes.

Source

pub fn neighbors(&self, node_id: &str) -> Result<Vec<GraphNode>, CodememError>

Get neighbors of a node (1-hop).

Source

pub fn connected_components(&self) -> Vec<Vec<String>>

Return groups of connected node IDs.

Treats the directed graph as undirected: two nodes are in the same component if there is a path between them in either direction. Each inner Vec<String> is one connected component.

Source

pub fn compute_centrality(&mut self)

Compute degree centrality for every node and update their centrality field.

Degree centrality for node v is defined as: (in_degree(v) + out_degree(v)) / (N - 1) where N is the total number of nodes. When N <= 1, centrality is 0.

Source

pub fn get_all_nodes(&self) -> Vec<GraphNode>

Return all nodes currently in the graph.

Source

pub fn get_node_ref(&self, id: &str) -> Option<&GraphNode>

Return a reference to a node without cloning. Returns None if not found.

Source

pub fn get_edges_ref(&self, node_id: &str) -> Vec<&Edge>

Return references to edges incident on a node without cloning.

This is the zero-copy variant of GraphBackend::get_edges() — same lookup logic via edge_adj, but returns &Edge instead of owned Edge.

Source

pub fn recompute_centrality(&mut self)

Recompute and cache PageRank and betweenness centrality scores.

This should be called after loading the graph (e.g., on server start) and periodically when the graph changes significantly.

Source

pub fn recompute_centrality_for_namespace(&mut self, namespace: &str)

Recompute PageRank for a single namespace and update the cache.

Only scores for nodes in namespace are written; scores for nodes in other namespaces are left unchanged. This prevents cross-project pollution when the shared database holds multiple indexed projects.

Stale scores for deleted nodes in this namespace are evicted.

Source

pub fn recompute_centrality_with_options(&mut self, include_betweenness: bool)

Recompute centrality caches with control over which algorithms run.

PageRank is always computed. Betweenness centrality is only computed when include_betweenness is true, since it is O(V * E) and can be expensive on large graphs.

Source

pub fn ensure_betweenness_computed(&mut self)

Lazily ensure betweenness centrality has been computed.

If cached_betweenness is empty (e.g., after recompute_centrality_with_options(false)), this method computes and caches betweenness centrality on demand. If the cache is already populated, this is a no-op.

Source

pub fn get_pagerank(&self, node_id: &str) -> f64

Get the cached PageRank score for a node. Returns 0.0 if not found.

Source

pub fn get_betweenness(&self, node_id: &str) -> f64

Get the cached betweenness centrality score for a node. Returns 0.0 if not found.

Source

pub fn raw_graph_metrics_for_memory( &self, memory_id: &str, ) -> Option<RawGraphMetrics>

Collect raw graph metrics for a memory node from both code-graph and memory-graph neighbors.

Code neighbors (sym:, file:, chunk:, pkg:) contribute centrality data (PageRank, betweenness). Memory neighbors (UUID-based) contribute connectivity data (count + edge weights). A function with many linked memories is more important; a memory with many linked memories has richer context.

Returns None if the memory node is not in the graph or has no neighbors.

Trait Implementations§

Source§

impl Default for GraphEngine

Source§

fn default() -> GraphEngine

Returns the “default value” for a type. Read more
Source§

impl GraphBackend for GraphEngine

Source§

fn add_node(&mut self, node: GraphNode) -> Result<(), CodememError>

Add a node to the graph.
Source§

fn get_node(&self, id: &str) -> Result<Option<GraphNode>, CodememError>

Get a node by ID.
Source§

fn remove_node(&mut self, id: &str) -> Result<bool, CodememError>

Remove a node by ID.
Source§

fn add_edge(&mut self, edge: Edge) -> Result<(), CodememError>

Add an edge between two nodes.
Source§

fn get_edges(&self, node_id: &str) -> Result<Vec<Edge>, CodememError>

Get edges from a node.
Source§

fn remove_edge(&mut self, id: &str) -> Result<bool, CodememError>

Remove an edge by ID.
Source§

fn bfs( &self, start_id: &str, max_depth: usize, ) -> Result<Vec<GraphNode>, CodememError>

BFS traversal from a start node up to max_depth.
Source§

fn dfs( &self, start_id: &str, max_depth: usize, ) -> Result<Vec<GraphNode>, CodememError>

DFS traversal from a start node up to max_depth.
Source§

fn bfs_filtered( &self, start_id: &str, max_depth: usize, exclude_kinds: &[NodeKind], include_relationships: Option<&[RelationshipType]>, ) -> Result<Vec<GraphNode>, CodememError>

BFS traversal with filtering: exclude certain node kinds and optionally restrict to specific relationship types.
Source§

fn dfs_filtered( &self, start_id: &str, max_depth: usize, exclude_kinds: &[NodeKind], include_relationships: Option<&[RelationshipType]>, ) -> Result<Vec<GraphNode>, CodememError>

DFS traversal with filtering: exclude certain node kinds and optionally restrict to specific relationship types.
Source§

fn shortest_path( &self, from: &str, to: &str, ) -> Result<Vec<String>, CodememError>

Shortest path between two nodes.
Source§

fn stats(&self) -> GraphStats

Get graph statistics.
Source§

fn get_all_nodes(&self) -> Vec<GraphNode>

Get all nodes in the graph.
Source§

fn get_node_ref(&self, id: &str) -> Option<&GraphNode>

Zero-copy node lookup. Returns a reference into the backend’s internal storage. Only meaningful for in-memory backends. Database backends should leave the default (returns None) and callers should use get_node() instead.
Source§

fn get_edges_ref(&self, node_id: &str) -> Vec<&Edge>

Zero-copy edge lookup. Returns references into the backend’s internal storage. Only meaningful for in-memory backends. Database backends should leave the default (returns empty) and callers should use get_edges() instead.
Source§

fn node_count(&self) -> usize

Number of nodes in the graph.
Source§

fn edge_count(&self) -> usize

Number of edges in the graph.
Source§

fn recompute_centrality(&mut self)

Recompute all centrality metrics (PageRank + betweenness).
Source§

fn recompute_centrality_with_options(&mut self, include_betweenness: bool)

Recompute centrality, optionally including expensive betweenness calculation.
Source§

fn recompute_centrality_for_namespace(&mut self, namespace: &str)

Recompute PageRank scoped to a single namespace, updating only that namespace’s scores in the cache. Nodes from other namespaces are unaffected.
Source§

fn ensure_betweenness_computed(&mut self)

Lazily compute betweenness centrality if not yet computed.
Source§

fn compute_centrality(&mut self)

Compute degree centrality (updates nodes in place).
Source§

fn get_pagerank(&self, node_id: &str) -> f64

Get cached PageRank score for a node.
Source§

fn get_betweenness(&self, node_id: &str) -> f64

Get cached betweenness centrality score for a node.
Source§

fn raw_graph_metrics_for_memory( &self, memory_id: &str, ) -> Option<RawGraphMetrics>

Collect graph metrics for a memory node (used in hybrid scoring).
Source§

fn connected_components(&self) -> Vec<Vec<String>>

Find connected components (treating graph as undirected).
Source§

fn strongly_connected_components(&self) -> Vec<Vec<String>>

Find strongly connected components using Tarjan’s algorithm. Each SCC is a group where every node can reach every other via directed edges.
Source§

fn pagerank( &self, damping: f64, iterations: usize, tolerance: f64, ) -> HashMap<String, f64>

Compute PageRank scores for all nodes. Returns a map from node ID to PageRank score.
Source§

fn pagerank_for_namespace( &self, namespace: &str, damping: f64, iterations: usize, tolerance: f64, ) -> HashMap<String, f64>

Compute PageRank scores for nodes in a single namespace. Only nodes belonging to namespace participate; cross-namespace edges are ignored. Returns a map from node ID to PageRank score (only for nodes in the namespace). Default is abstract; implementers must provide namespace-scoped computation.
Source§

fn louvain_communities(&self, resolution: f64) -> Vec<Vec<String>>

Run Louvain community detection at the given resolution. Returns groups of node IDs, one group per community.
Source§

fn topological_layers(&self) -> Vec<Vec<String>>

Compute topological layers of the graph. Returns layers where all nodes in layer i have no dependencies on nodes in layer i or later.
Source§

fn louvain_with_assignment(&self, resolution: f64) -> HashMap<String, usize>

Return node-to-community-ID mapping for Louvain.
Source§

fn subgraph_top_n( &self, n: usize, namespace: Option<&str>, kinds: Option<&[NodeKind]>, ) -> (Vec<GraphNode>, Vec<Edge>)

Return a top-N subgraph: the highest-centrality nodes plus all edges between them. Non-structural edges from top-N nodes pull their targets into the result.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,