Crate srcgraph_metrics

Expand description

Code-analysis metrics over a petgraph::Graph<N, E> where N: srcgraph_core::ClassNode and E: srcgraph_core::EdgeKind.

See DESIGN.md at the workspace root for the phased rollout plan (Phase 1: scc, lcom4, betweenness).

@yah:relay(R153, “Phase 2 metrics: instability + cyclomatic + halstead + entropy”) @yah:at(2026-05-14T17:19:47Z) @yah:status(open) @yah:assignee(agent:claude) @yah:handoff(“Phase 1 benchmark validated (56.9x vs networkx on real graph). Open Phase 2 per DESIGN.md:58-63 — four per-class metrics that operate on data already in the GraphML schema, no new graph traversal needed.”) @yah:next(“T1 instability (Ce/(Ca+Ce) + SDP violations)”) @yah:next(“T2 cyclomatic per-method CC + bimodality (dip statistic)”) @yah:next(“T3 halstead V/D/E/B from η₁/η₂/N₁/N₂”) @yah:next(“T4 entropy Shannon over identifier tokens”)

@yah:ticket(R153-T1, “metrics::instability — Martin’s I = Ce/(Ca+Ce) + SDP violations”) @yah:at(2026-05-14T17:19:56Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R153) @yah:verify(“cargo test -p srcgraph-metrics –lib instability”) @yah:handoff(“Implemented metrics::instability at namespace granularity: compute_instability returns NamespaceInstability { ca, ce, instability } per package, counting only inter-namespace edges (intra-package edges are package-internal cohesion, not coupling). find_sdp_violations flags inter-namespace edges where I(src) < I(tgt) — stable depending on unstable — sorted by gap descending. Isolated namespaces (Ca+Ce=0) yield instability=None and are skipped in violation detection. 8 unit tests cover the formula edges, intra/inter-namespace split, parallel edges, isolated nodes, SDP detection, and sort order.”)

@yah:ticket(R153-T2, “metrics::cyclomatic — per-method CC + bimodality (dip statistic)”) @yah:at(2026-05-14T17:19:56Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R153) @yah:verify(“cargo test -p srcgraph-metrics –lib cyclomatic”) @yah:handoff(“Implemented metrics::cyclomatic. Added cyclomatic_complexity() accessor to the ClassNode trait (defaults None; OwnedClassNode returns its parsed JSON). parse_methods reads the {methods:[{name,complexity}]} blob; compute_stats yields mean/median/min/max/std/total; detect_bimodality mirrors the Python heuristic (sort, largest gap, threshold = max(3, 3×median_gap), ≥2 on each side). compute_cyclomatic walks every node and returns CyclomaticReport with None fields for nodes whose blob is absent or malformed. 9 tests cover parse, stats, the short/flat/cluster/unbalanced bimodality cases, and full-graph traversal.”) @yah:assumes(“Field name is dip_score for continuity with the Python panel even though we use a normalized gap-ratio heuristic, not the true Hartigan dip. If a downstream consumer expects the statistical p-value, swap in a dip-test crate later — the type and call site stay the same.”)

@yah:ticket(R153-T3, “metrics::halstead — V/D/E/B from η₁/η₂/N₁/N₂ already extracted”) @yah:at(2026-05-14T17:19:56Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R153) @yah:verify(“cargo test -p srcgraph-metrics –lib halstead”) @yah:handoff(“Implemented metrics::halstead. Exposed halstead_eta1/eta2/n1/n2 accessors on the ClassNode trait (default 0); halstead_metrics(η₁,η₂,N₁,N₂) returns (V, D, E, B, η, N) using the Python edge-case rules (η=0 → V=0; η₂=0 → D=0); compute_halstead walks every node into a Halstead struct. 4 tests cover the worked example, both zero edge cases, and full-graph traversal.”)

@yah:ticket(R153-T4, “metrics::entropy — Shannon entropy over identifier tokens”) @yah:at(2026-05-14T17:19:56Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R153) @yah:verify(“cargo test -p srcgraph-metrics –lib entropy”) @yah:handoff(“Implemented metrics::entropy at token-frequency granularity (per the ticket’s –next bullets, not the Python responsibility-distribution variant): added method_tokens() accessor to the ClassNode trait; token_histogram pools every token across every method into a frequency map; entropy_from_counts and shannon_entropy compute H = -Σ p·log₂(p). compute_entropy returns TokenEntropy { entropy, distinct, total } per class; ≤1 distinct token yields 0.0, missing/malformed blob yields None. 7 tests cover the formula, pooling, and trait-level traversal.”) @yah:assumes(“The Python reference (analysis/entropy.py) computes entropy over methodConnectivity component sizes, not identifier tokens. This implementation follows the ticket’s –next bullets (methodTokens histogram). If the visiting tool’s panel needs the responsibility-distribution variant, add it as a second function — don’t replace this one.”)

@yah:relay(R155, “Phase 3 metrics: clone_detection + association_rules + process_mining”) @yah:at(2026-05-14T17:35:26Z) @yah:status(open) @yah:assignee(agent:claude) @yah:next(“T1 clone_detection — n-gram fingerprints over methodTokens, similarity pairs, family grouping”) @yah:next(“T2 association_rules — apriori-style itemset growth for class-co-occurrence rules”) @yah:next(“T3 process_mining — Petri net construction from callSequences + conformance scoring”)

@yah:ticket(R155-T1, “metrics::clone_detection — n-gram fingerprints + similarity pairs + family grouping”) @yah:at(2026-05-14T17:35:34Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R155) @yah:verify(“cargo test -p srcgraph-metrics –lib clone_detection”) @yah:handoff(“Implemented metrics::clone_detection. Added method_fingerprints() accessor to the ClassNode trait (defaults None; OwnedClassNode returns the parsed JSON). parse_method_fingerprints reads the {methods:[{name,tokens,line,endLine,params}]} blob; extract_ngrams builds n-gram sets from space-separated token streams (short streams collapse to one tuple, per Python ref); ngram_jaccard / jaccard compute |A∩B|/|A∪B| with the empty/empty=0 convention. detect_clone_pairs is O(n²) with two prunes: 2:1 token-count ratio gate and same-class-same-name skip (partial-class siblings). group_clone_families runs path-compressed union-find over pair indices, computes per-family avg similarity, sorts families by size desc, assigns deterministic ids. compute_clone_analysis walks the graph (accepting both Value::Object and Value::String-encoded blobs for GraphML round-trip safety) and returns CloneAnalysis { pairs, families, total_methods, cloned_methods, clone_ratio }. 12 tests cover n-gram edges, Jaccard, both prune paths, transitive family grouping, family ordering+id assignment, full-graph traversal, and string-encoded blob handling.”) @yah:assumes(“methodFingerprints tokens is a space-separated normalised-token-class string (matches Python ref + extractor schema). The Python reference has a separate extract_semantic_diff() helper that wasn’t ported — the panel uses it for side-by-side display only, not for detection. Add later if the Rust path needs to drive the UI.”)

@yah:ticket(R155-T2, “metrics::association_rules — apriori itemset growth over class-co-occurrence”) @yah:at(2026-05-14T17:35:34Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R155) @yah:verify(“cargo test -p srcgraph-metrics –lib association_rules”) @yah:handoff(“Implemented metrics::association_rules. Added call_sequences() accessor to the ClassNode trait (defaults None; OwnedClassNode returns the parsed JSON). parse_call_sequences reads the {sequences:[{method,calls,…}]} blob. mine_frequent_itemsets is Apriori bottom-up by k=1..=5 (Python-ref cap): direct count for 1-itemsets, then all k-combos of frequent 1-items gated on the (k-1)-subset property + support; returns FrequentItemset {items, support} sorted by support desc with item-list tiebreaker. generate_rules emits A→C for every non-empty proper subset of each frequent itemset, scoring confidence = sup(A∪C)/sup(A) and lift = confidence / (sup(C)/n); sorted confidence desc → support desc → antecedent asc → consequent asc. classify_rule maps confidence to invariant (≥0.99) / strong (≥0.85) / moderate (≥0.5). compute_association_analysis walks the graph (accepts both Value::Object and string-encoded blobs for GraphML round-trip), filters sequences with calls.len()>=2 into BTreeSet transactions, and returns AssociationAnalysis {transactions, rules, num_rules, invariants, strong, moderate, itemsets}. 20 tests cover parse, combinations helper, mining (singletons/pairs/min_support/empty/sort order), rule generation (high-conf detection/sort/classification/empty), classify thresholds, and full-graph traversal incl. string-encoded blob + short-sequence skip.”) @yah:assumes(“min_support is literal in the API (caller-supplied), not the Python adaptive max(2, n/10) — the analysis result returns transactions count so callers wanting the adaptive pattern can run a two-pass mine. PyO3 binding (R152-T5) is the natural place to add an adaptive wrapper.”)

@yah:ticket(R155-T3, “metrics::process_mining — Petri net construction from callSequences + conformance”) @yah:at(2026-05-14T17:35:34Z) @yah:status(review) @yah:assignee(agent:claude) @yah:parent(R155) @yah:verify(“cargo test -p srcgraph-metrics –lib process_mining”) @yah:handoff(“Implemented metrics::process_mining. Reuses association_rules::parse_call_sequences (same {sequences:[{method,calls}]} blob, same string-or-object GraphML tolerance). build_petri_net runs an alpha-miner pass: collects activities + direct-succession counts + first/last activities, classifies ordered pairs as causal (one-way only) vs parallel (both directions); emits one t_ transition per activity (sorted), p_start → first activities, last activities → p_end, plus one p_ intermediate per causal pair wired t_a→p_i→t_b. classify_transitions reclassifies each transition by local arc topology: choice (input place feeds >1 transition), loop (output→transition whose output reaches one of our inputs — one-hop back-edge per Python), parallel (output feeds >1 transition), else mandatory. compute_conformance derives valid (t_a→t_b) successions from arcs, then counts sequences whose every consecutive pair is in that set; sequences with <2 calls trivially conform (matches Python). compute_process_mining walks the graph (BTreeMap-staged for determinism), skips nodes with <2 sequences, builds+classifies+scores per node, and returns ProcessMiningAnalysis { nodes: [NodeProcessMining { node_id, class_name, net, conformance, num_places, num_transitions, num_arcs }], total }. 16 tests cover build (empty/linear/single-call/branching/parallel-pair/arc-validity), classify (linear-mandatory/branching), conformance (perfect/empty/violation/short-seq/range), and full-graph walk (skips thin+blobless, accepts string-encoded blob, empty graph).”) @yah:assumes(“Parallel pairs (a,b seen both ways) yield NO intermediate place — they’re noted by the direct-succession counter but the Python ref only emits places for causal pairs, so we match that. If a downstream consumer wants explicit parallel-split nodes in the rendered net, add them as a second pass (kind="parallel-split") rather than retrofitting build_petri_net.”)

Modules§

association_rules: Apriori-style frequent itemset mining + association-rule generation over per-class callSequences blobs.
betweenness: Betweenness centrality — Brandes’ algorithm. The headline perf target versus networkx’s Python-loop O(VE) implementation.
clone_detection: Clone detection via n-gram Jaccard similarity on per-method fingerprint token streams.
cyclomatic: Per-method cyclomatic complexity + bimodality detection.
entropy: Shannon entropy over per-class identifier-token frequencies.
halstead: Halstead complexity — V (volume), D (difficulty), E (effort), B (bug estimate) computed from the four raw token counts (η₁, η₂, N₁, N₂) already extracted onto each class node.
instability: Martin’s instability metric — per-namespace afferent / efferent coupling from inter-package edges, and Stable Dependencies Principle violations.
lcom4: LCOM4 cohesion — connected components on the per-class method-field bipartite graph. LCOM4 = number of components (1 = cohesive).
process_mining: Petri-net reconstruction from per-class callSequences blobs, with token-replay conformance scoring.
scc: Strongly-connected components — Tarjan via petgraph::algo::tarjan_scc, plus condensation DAG and per-SCC status rating.