god-gragh 0.5.0

A graph-based LLM white-box optimization toolbox: topology validation, Lie group orthogonalization, tensor ring compression
Documentation

God-Graph

Crates.io Documentation License Build Status Coverage Status

God-Graph 是一个 LLM 白盒分析工具——把 LLM 从黑盒变成可编辑的白盒

核心创新:DifferentiableGraph(可微图结构)——用梯度下降优化神经网络架构,支持动态注意力剪枝、拓扑缺陷检测、自动架构搜索。


🎯 核心定位

God-Graph 不是

  • ❌ LLM 推理引擎(打不过 llama.cpp
  • ❌ GNN 训练框架(打不过 DGL/PyG)
  • ❌ 通用图算法库(petgraph 更成熟)

God-Graph 是

  • LLM 白盒分析工具——可以检查/修改模型拓扑结构
  • 可微图结构引擎——用梯度下降优化神经网络架构(DifferentiableGraph)
  • 拓扑缺陷检测器——发现梯度阻断、孤立节点、缺失残差连接
  • 数学层面优化器——李群正交化、张量环压缩

一句话总结:God-Graph 用 CAD 软件的设计哲学优化 LLM——检查"表面断裂"(孤立节点)、"非流形几何"(梯度阻断)、"尺寸约束"(注意力头平衡),并首创可微图结构实现梯度引导的架构搜索。


📚 核心文档

完整文档导航: docs/README.md

快速链接

文档 说明
快速开始 5 分钟上手 God-Graph
DifferentiableGraph 教程 可微图结构完整教程
设计哲学 为什么需要 CAD-LLM 范式迁移
架构指南 模块职责和工作流
性能报告 并行算法和 SIMD 性能数据
实现状态 功能完成度和路线图
TinyLlama 验证 真实模型端到端验证

⚡ DifferentiableGraph 快速开始

DifferentiableGraph 是 God-Graph 的核心创新——它将图结构从"静态容器"变为"可微分的计算本身",支持用梯度下降优化神经网络架构。

核心应用场景

  1. 动态注意力剪枝:梯度引导剪除弱注意力边,减少 30-50% 冗余连接
  2. 拓扑缺陷检测:自动发现孤立节点、梯度阻断、缺失残差连接
  3. 神经架构搜索:让模型自己学习最优残差连接和注意力模式
  4. 权重编辑:李群正交化保证数值稳定性,支持精确的权重修改

5 分钟上手示例

use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig, ThresholdEditPolicy};

// 1. 从标准 Transformer 构建可微图
let mut graph = build_mini_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);

// 2. 定义目标函数(注意力熵 + 稀疏性正则)
let loss_fn = |g: &DifferentiableGraph| {
    g.entropy_loss() + 0.1 * g.sparsity_loss()
};

// 3. 梯度下降优化结构
for step in 0..100 {
    let loss = loss_fn(&diff_graph);
    let grads = diff_graph.compute_structure_gradients(loss);
    diff_graph.update_structure(&grads, 0.01);
    
    if step % 10 == 0 {
        println!("Step {}: loss={:.4}", step, loss);
    }
}

// 4. 导出剪枝后的图
let policy = ThresholdEditPolicy::new(0.5);
let pruned_graph = diff_graph.discretize(&policy);
println!("剪枝了 {} 条弱注意力边", pruned_graph.num_pruned_edges());

完整示例

示例 说明 运行命令
可微注意力剪枝 梯度引导剪除弱边 cargo run --example differentiable_graph --features tensor
拓扑缺陷检测 检测模型拓扑问题 cargo run --example cad_llm_validate_1b --features transformer
李群正交化 权重正交化稳定性 cargo run --example cad_llm_orthogonalize --features transformer
张量环压缩 模型压缩 workflow cargo run --example cad_llm_tensor_ring --features transformer

详见 DifferentiableGraph 完整教程

🚀 快速开始

安装

[dependencies]
god-gragh = "0.5.0-alpha"

基础用法:图数据结构和算法

use god_gragh::graph::Graph;
use god_gragh::algorithms::traversal::{bfs, dfs};

// 创建图
let mut graph = Graph::<String, f64>::directed();
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let _ = graph.add_edge(a, b, 1.0);

// BFS 遍历
bfs(&graph, a, |node, _depth| {
    println!("访问:{}", node.data());
    true
});

高级用法:LLM 拓扑优化

use god_gragh::transformer::optimization::{
    ModelSwitch, CadStyleEditor, TensorRingCompressor
};

// 1. 从 Safetensors 加载模型
let mut graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 2. 检测拓扑缺陷
let mut editor = CadStyleEditor::new(&mut graph);
let defects = editor.detect_defects()?;
println!("发现 {} 个缺陷", defects.len());

// 3. 张量环压缩
let compressor = TensorRingCompressor::default();
let report = compressor.compress_graph(&graph)?;
println!("压缩比:{:.2}x", report.compression_ratio);

// 4. 导出优化后的模型到 Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;

ModelSwitch 双向转换

ModelSwitch 提供 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换:

use god_gragh::transformer::optimization::ModelSwitch;

// 加载:Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 验证拓扑
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效:{}", topology_report.is_valid);

// 验证权重(比较两个图的权重差异)
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异:{:.6e}", weight_diff.max_l2_diff);

// 导出:GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;

功能特性

  • ✅ 支持 F32/F64/F16 数据类型
  • ✅ 权重精度验证(L2 范数比较)
  • ✅ 拓扑完整性检查
  • ✅ 往返精度损失 < 1e-5

详见 ModelSwitch 示例


🔬 核心功能

1. ModelSwitch 双向转换 ⭐ 核心功能

ModelSwitch 实现 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换,是 LLM 白盒分析的工作流基础。

use god_gragh::transformer::optimization::ModelSwitch;

// 1. 加载:Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 2. 验证拓扑完整性
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效:{}", topology_report.is_valid);
println!("连通分量:{}", topology_report.connected_components);
println!("是 DAG: {}", topology_report.is_dag);

// 3. 验证权重精度(比较两个图的权重差异)
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异:{:.6e}", weight_diff.max_l2_diff);
println!("平均 L2 差异:{:.6e}", weight_diff.avg_l2_diff);

// 4. 导出:GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;

核心功能

  • 双向转换:Safetensors ↔ GodGraph 无损转换
  • 数据类型支持:F32、F64、F16 自动转换
  • 拓扑验证:检查连通性、环、孤立节点
  • 权重验证:L2 范数比较,精度损失 < 1e-5
  • 算子推断:根据权重名称自动推断算子类型(Attention、MLP、Norm 等)

运行示例

cargo run --example cad_llm_switch --features safetensors

详见 ModelSwitch 示例


2. 可微图结构 (DifferentiableGraph) ⭐ 核心创新

这是 God-Graph 的原创性贡献——将图结构从"静态容器"变为"可微分的计算本身"。

use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig};

// 1. 从标准 Transformer 构建可微图
let mut graph = build_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);

// 2. 梯度下降优化结构
for step in 0..100 {
    let loss = diff_graph.entropy_loss() + 0.1 * diff_graph.sparsity_loss();
    let grads = diff_graph.compute_structure_gradients(loss);
    diff_graph.update_structure(&grads, 0.01);
}

// 3. 导出剪枝后的图
let pruned = diff_graph.discretize(&ThresholdEditPolicy::new(0.5));
println!("剪枝比例:{:.2}%", pruned.pruned_ratio() * 100.0);

核心技术

  • 连续松弛:将离散的边存在性转换为连续概率(0 到 1)
  • STE 估计器:Straight-Through Estimator 实现离散 - 连续双向转换
  • Gumbel-Softmax:可微分采样,支持梯度反向传播
  • 李群正交化:保证权重矩阵数值稳定性

应用场景

  • 动态注意力剪枝(减少 30-50% 冗余连接)
  • 神经架构搜索(自动发现最优残差连接)
  • 拓扑缺陷检测(孤立节点、梯度阻断)

详见 DifferentiableGraph 教程


3. 李群正交化 (Lie Group Orthogonalization)

用李群理论保证权重矩阵的正交性,提升数值稳定性。

use god_gragh::tensor::decomposition::{lie_exponential, is_orthogonal};

// so(n) 李代数 → SO(n) 李群
let algebra = DenseTensor::from_vec(
    vec![0.0, -0.1, 0.1, 0.0],
    vec![2, 2],
);

let rotation = lie_exponential(&algebra)?;
assert!(is_orthogonal(&rotation, 1e-5));

数学原理:指数映射 exp: so(n) → SO(n) 用 Padé 近似 + 缩放 - 平方算法实现。


4. 张量环压缩 (Tensor Ring Compression)

把高维张量表示为 3D 核心张量的环,减少参数量。

use god_gragh::transformer::optimization::TensorRingCompressor;

let compressor = TensorRingCompressor::default();
let ring = compressor.decompose(&weight_tensor)?;

println!("压缩比:{:.2}x", ring.compression_ratio());

压缩比公式(m × n) / (r₀×m×r₁ + r₁×n×r₂)


5. 拓扑约束求解 (Topology Constraint Solving)

像 CAD 软件一样检查 LLM 的"几何完整性"。

use god_gragh::transformer::optimization::{CadStyleEditor, TopologyConstraint};

let mut editor = CadStyleEditor::new(&mut graph);

// 检测缺陷
let defects = editor.detect_defects()?;

// 添加约束
editor.add_constraint(TopologyConstraint::ResidualConnection {
    from_layer: "attention".to_string(),
    to_layer: "output".to_string(),
})?;

// 求解约束(自动修复)
editor.solve_constraints()?;

缺陷类型:孤立节点、disconnected 组件、梯度阻断、缺失残差连接。


6. GraphTransformer 显式注意力分析

定位说明:GraphTransformer 主要用于可视化注意力拓扑动态剪枝弱边添加自定义连接。对于高性能推理,建议转换为标准 LlamaModel。

use god_gragh::transformer::graph_transformer::GraphTransformer;

let mut transformer = GraphTransformer::new(12, 12, 768);
transformer.build_graph(&input_ids);

// 可视化注意力拓扑
let dot = transformer.to_dot();
std::fs::write("attention_graph.dot", dot)?;

// 剪枝弱注意力边(阈值=0.01)
let pruned = transformer.prune_weak_edges(0.01);
println!("剪枝 {} 条边", pruned);

// 添加自定义长程连接
transformer.add_skip_connection(layer_0, layer_11);

核心优势

  • 每条注意力边可单独访问/修改(黑盒推理引擎做不到)
  • 支持动态拓扑编辑(传统静态图做不到)
  • 可导出为 DOT/Graphviz 可视化

📊 性能数据

并行算法加速比

算法 规模 串行时间 并行时间 加速比
PageRank 1,000 节点 53.9ms 668µs 80.7x
DFS 50K 节点 9.7ms 1.3ms 7.5x
Connected Components 2,000 节点 - 357.8µs -

详见 性能报告

SIMD 优化

图规模 串行 并行 SIMD 提升
100 节点 2.1ms 280µs ~150µs 14x
1,000 节点 210ms 2.8ms ~1.5ms 140x

🏗️ 架构设计

CAD-LLM 范式映射

CAD 概念 LLM 等价物 GodGraph 实现
表面断裂检查 孤立注意力头检测 connected_components
非流形几何检查 梯度阻断检测 topological_sort + path_analysis
尺寸约束 注意力头权重平衡 AttentionHeadBalance 约束
平行约束 残差连接强制 ResidualConnection 约束
装配约束 模块接口匹配 validate_assembly
零件替换 模块提取/替换 extract_module / replace_module

详见 设计哲学


📦 特性标志

基础特性

特性 说明
parallel 并行算法(Rayon)
simd SIMD 向量化(wide::f64x4)
tensor 张量核心支持(ndarray)
tensor-sparse 稀疏张量格式(COO/CSR)
tensor-gnn GNN 层(GCN/GAT/GraphSAGE)

LLM 优化特性

特性 说明
transformer Transformer 基础架构
safetensors Safetensors 模型加载
cad-llm CAD-LLM 拓扑优化(实验性)

元特性(推荐)

元特性 包含
tensor-full 所有张量功能
tensor-inference GNN 推理专用
llm 完整 LLM 支持

🔮 路线图

版本 状态 关键特性
v0.4.3-beta ✅ 已发布 李群正交化、张量环压缩、拓扑约束
v0.5.0-alpha 🔥 当前 DifferentiableGraph 可微图结构、完整模型加载、真实模型验证
v0.6.0-beta 📅 计划 内存池基准测试、GraphTransformer 执行引擎
v0.7.0-rc 📅 计划 生产环境测试、crates.io 发布

v0.5.0-alpha 核心特性

  • DifferentiableGraph(可微图结构):1421 行核心代码,支持梯度引导的架构搜索
  • 真实模型验证:TinyLlama-1.1B 端到端优化流程
  • 图级正交化修复:原地正交化接口(零拷贝),误差 < 1e-8
  • 完整示例:5 个端到端 DifferentiableGraph 示例

详见 实现状态todo.json


🎓 目标用户

适合使用 God-Graph

LLM 研究人员——想检查和修改模型拓扑结构 ✅ 模型压缩工程师——想用张量环/正交化压缩模型 ✅ QA 团队——想验证模型完整性和数值稳定性 ✅ 算法探索者——想实验动态剪枝、稀疏注意力、架构搜索 ✅ 白盒分析需求——想理解 LLM 内部工作机制

不适合使用 God-Graph

应用开发者——只想用 LLM 推理(用 llama.cpp) ❌ 训练工程师——想训练新模型(用 PyTorch/JAX) ❌ GPU 加速需求——需要 CUDA 推理(用 candlevllm


🌟 God-Graph 的独特优势

1. 桶式邻接表 + Generation 索引

  • O(1) 增量更新:优于静态 CSR 格式,适合动态图编辑场景
  • 防止 ABA 问题:删除节点后重用索引不会混淆(petgraph 没有的类型安全)
  • 64 字节对齐:避免 CPU 缓存 false sharing,推理性能基础

2. DifferentiableGraph(原创创新)

  • 可微图结构:将离散图结构转换为连续可微形式
  • 梯度引导搜索:用梯度下降自动发现最优神经网络架构
  • STE + Gumbel-Softmax:支持离散 - 连续双向转换,梯度反向传播

3. GraphTransformer 显式注意力

  • 每条边可单独访问/修改:黑盒推理引擎(llama.cpp)做不到
  • 动态拓扑编辑:传统静态图(petgraph)做不到
  • 可视化支持:导出 DOT/Graphviz 格式,直观理解注意力模式

4. ModelSwitch 双向转换工作流

  • Safetensors ↔ GodGraph:HuggingFace 格式双向转换
  • 权重精度验证:L2 范数比较,往返损失 < 1e-5
  • 拓扑完整性检查:自动检测孤立节点、梯度阻断
  • 算子类型推断:根据权重名称识别 Attention、MLP、Norm 等

5. 李群正交化 + 张量环压缩

  • 数学保证:李群理论保证权重矩阵正交性,数值稳定性
  • 压缩比:张量环分解减少 2-10x 参数量
  • 端到端工作流:Safetensors ↔ GodGraph ↔ Safetensors

🤝 贡献

欢迎贡献!请确保:

  • 代码通过 cargo clippycargo fmt
  • 添加适当的测试
  • 更新文档

📄 许可证

双许可证:MIT 或 Apache-2.0(任选其一)


🙏 致谢


联系方式: silverenternal 3147264070@qq.com 项目地址: https://github.com/silverenternal/god-graph

Quick Start

Installation

Add dependency to Cargo.toml:

[dependencies]
god-gragh = "0.4.2-beta"

Basic Usage

use god_gragh::graph::Graph;
use god_gragh::graph::traits::{GraphOps, GraphQuery};

// Create a directed graph
let mut graph = Graph::<String, f64>::directed();

// Add nodes
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let c = graph.add_node("C".to_string()).unwrap();

// Add edges
graph.add_edge(a, b, 1.0).unwrap();
graph.add_edge(b, c, 2.0).unwrap();
graph.add_edge(a, c, 3.0).unwrap();

// Query
println!("Nodes: {}", graph.node_count());
println!("Edges: {}", graph.edge_count());

// Iterate over neighbors
for neighbor in graph.neighbors(a) {
    println!("Neighbor: {}", graph[neighbor]);
}

Using Graph Builder

use god_gragh::graph::builders::GraphBuilder;

let graph = GraphBuilder::directed()
    .with_nodes(vec!["A", "B", "C", "D"])
    .with_edges(vec![
        (0, 1, 1.0),
        (0, 2, 2.0),
        (1, 3, 3.0),
        (2, 3, 4.0),
    ])
    .build()
    .unwrap();

Algorithms

Traversal Algorithms

use god_gragh::algorithms::traversal::{dfs, bfs, topological_sort, tarjan_scc};

// Depth-First Search
dfs(&graph, start_node, |node| {
    println!("Visit: {}", node.data());
    true // Continue traversal
});

// Breadth-First Search
bfs(&graph, start_node, |node| {
    println!("Visit: {}", node.data());
    true
});

// Topological Sort (DAG)
let order = topological_sort(&graph);

// Tarjan's Strongly Connected Components
let sccs = tarjan_scc(&graph);

Shortest Path Algorithms

use god_gragh::algorithms::shortest_path::{dijkstra, bellman_ford, floyd_warshall, astar};

// Dijkstra's Algorithm (non-negative weights)
let (path, distance) = dijkstra(&graph, start, Some(end)).unwrap();

// A* Search
let heuristic = |node: NodeIndex| -> f64 { /* Heuristic function */ 0.0 };
let (path, distance) = astar(&graph, start, end, heuristic).unwrap();

// Bellman-Ford (handles negative weights)
let distances = bellman_ford(&graph, start);

// Floyd-Warshall (all-pairs shortest paths)
let distances = floyd_warshall(&graph);

Minimum Spanning Tree

use god_gragh::algorithms::mst::{kruskal, prim};

// Kruskal's Algorithm
let mst = kruskal(&graph);

// Prim's Algorithm
let mst = prim(&graph, start_node);

Centrality Algorithms

use god_gragh::algorithms::centrality::{
    degree_centrality, betweenness_centrality, closeness_centrality, pagerank
};

// Degree Centrality
let centrality = degree_centrality(&graph);

// Betweenness Centrality
let centrality = betweenness_centrality(&graph);

// Closeness Centrality
let centrality = closeness_centrality(&graph);

// PageRank
let ranks = pagerank(&graph, 0.85, 20);

Community Detection

use god_gragh::algorithms::community::{connected_components, label_propagation};

// Connected Components
let components = connected_components(&graph);

// Label Propagation Algorithm
let communities = label_propagation(&graph);

Flow Algorithms

use god_gragh::algorithms::flow::{edmonds_karp, dinic, push_relabel};

// Edmonds-Karp Maximum Flow
let (flow, residual_graph) = edmonds_karp(&graph, source, sink);

// Dinic's Algorithm
let flow = dinic(&graph, source, sink);

// Push-Relabel Algorithm
let flow = push_relabel(&graph, source, sink);

Parallel Algorithms

Enable parallel feature to use parallel algorithms:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["parallel"] }
use god_gragh::algorithms::parallel;

// Parallel BFS
let layers = parallel::bfs_parallel(&graph, start_node);

// Parallel PageRank
let ranks = parallel::pagerank_parallel(&graph, 0.85, 20);

// Parallel Connected Components
let components = parallel::connected_components_parallel(&graph);

SIMD Optimization

Enable simd feature for SIMD vectorization (supports stable Rust):

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["simd"] }
use god_gragh::algorithms::parallel;

// SIMD-accelerated PageRank
#[cfg(feature = "simd")]
let ranks = parallel::par_pagerank_simd(&graph, 0.85, 20);

// SIMD-accelerated Degree Centrality
#[cfg(feature = "simd")]
let centrality = parallel::par_degree_centrality_simd(&graph);

Implementation Details: Uses wide::f64x4 type for 4-way parallel floating-point operations, automatically leveraging CPU SIMD instruction sets (SSE/AVX/AVX-512).

Tensor & GNN Support

Enable tensor features for Graph Neural Network workflows:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["tensor", "tensor-gnn"] }

Basic Tensor Operations

use god_gragh::tensor::{DenseTensor, TensorBase, TensorOps};

// Create tensors
let a = DenseTensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = DenseTensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);

// Matrix multiplication
let c = a.matmul(&b);

// Transpose
let t = a.transpose(None);

// Normalize
let norm = a.normalize();

Graph-Tensor Conversion

use god_gragh::graph::Graph;
use god_gragh::tensor::GraphTensorExt;

// Create a graph with vector node features
let mut graph = Graph::<Vec<f64>, f64>::directed();

let n0 = graph.add_node(vec![1.0, 0.0]).unwrap();
let n1 = graph.add_node(vec![0.0, 1.0]).unwrap();
let n2 = graph.add_node(vec![1.0, 1.0]).unwrap();

let _ = graph.add_edge(n0, n1, 1.0);
let _ = graph.add_edge(n1, n2, 1.0);
let _ = graph.add_edge(n2, n0, 1.0);

// Convert to tensor representation
let (features, adjacency) = graph.to_tensor_representation().unwrap();

assert_eq!(features.shape(), &[3, 2]);
assert_eq!(adjacency.num_nodes, 3);

GNN Layers

Important: God-Graph GNN modules are inference-only (forward pass only). For training workflows, integrate with external autograd libraries:

  • dfdx: Deep learning framework with CUDA support
  • Candle: HuggingFace's lightweight tensor library
  • tch-rs: Rust bindings for PyTorch

Inference Example (Recommended Use Case)

use god_gragh::tensor::gnn::{GCNConv, GATConv, GraphSAGE, MessagePassingLayer};

// Create GCN layer
let gcn = GCNConv::new(64, 64);

// Create GAT layer (multi-head attention)
let gat = GATConv::new(
    64,  // in_features
    64,  // out_features
    4,   // num_heads
);

// Create GraphSAGE layer
let graphsage = GraphSAGE::new(
    64,  // in_features
    32,  // out_features
    10,  // num_samples
);

// Forward pass (inference only)
let h1 = gcn.forward(&features, &adjacency);
let h2 = gat.forward(&h1, &edge_index);
let output = graphsage.forward(&h2, &edge_index);

Training Integration Example (with dfdx)

For complete GNN training, integrate with dfdx:

// Pseudo-code: Integrate god-gragh GNN with dfdx autograd
use dfdx::prelude::*;
use god_gragh::tensor::gnn::GCNConv;

// 1. Use god-gragh for graph structure and forward pass
let gcn = GCNConv::new(64, 64);
let output = gcn.forward(&features, &adjacency);

// 2. Convert to dfdx tensor for autograd
// let dfdx_tensor = Tensor1D::from(output.data());

// 3. Define loss and optimizer (dfdx)
// let loss = cross_entropy_loss(&dfdx_tensor, &labels);
// let mut optimizer = Adam::new(model.parameters(), lr=0.001);

// 4. Training loop
// for epoch in 0..num_epochs {
//     optimizer.zero_grad();
//     let loss = forward_pass(&graph, &labels);
//     optimizer.backward(&loss);
//     optimizer.step();
// }

See: examples/differentiable_graph.rs for an example of differentiable graph structures and gradient-based optimization.

Memory Pool Optimization

use god_gragh::tensor::{TensorPool, PoolConfig};

// Create a tensor pool
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);

// Acquire tensor from pool (automatically zeroed)
let tensor = pool.acquire(vec![100, 100]);

// Automatically returned to pool when dropped
drop(tensor);

Benefits:

  • Memory Reuse: Reduces allocation overhead in iterative algorithms (PageRank, GNN training) by 80-90%
  • Automatic Recycling: PooledTensor automatically returns to pool on Drop
  • Gradient Checkpointing: GradientCheckpoint reduces memory usage during backpropagation by 40-60%

Memory Pool Benchmark Results

Latest benchmarks (run on Linux, Rust 1.85, March 2026):

Benchmark Time Pool Hit Rate Allocation Reduction
Iterative (without pool) 850.84 µs N/A Baseline
Iterative (with pool) 127.76 µs 98-100% 98-99.9%
GNN Iteration 31.93 µs 96-99% 96-99%
MatMul Temporaries 42.15 µs 95-98% 95-98%
Small Tensors (16x16) 6.89 µs 98%+ 98%+
Large Tensors (512x512) 17.36 µs 95%+ 95%+
Sequential Alloc/Dealloc 34.71 µs 98%+ 98%+
Warm Pool (preallocated) 34.39 µs 100% 100%
Cold Pool (no prealloc) 35.32 µs 98%+ 98%+

Key Findings:

  1. Allocation Reduction: The memory pool achieves 98-99.9% reduction in new system allocations for iterative workloads, validating the "80-90% reduction" claim with actual measurements showing even better results.

  2. Performance Speedup: For iterative allocation patterns (50 iterations of 128x128 tensors), the pool achieves 6.7x speedup (850.84 µs → 127.76 µs).

  3. Preallocation Benefit: Warm pools with preallocation achieve near-perfect hit rates (100%), eliminating allocation overhead entirely.

  4. Typical Workloads:

    • GNN Forward Pass: 96-99% allocation reduction (hidden state temporaries)
    • Attention QKV Projections: 95-98% reduction (sequential matmul temporaries)
    • Batch Processing: 98%+ reduction with preallocated pools

Pool Statistics from Benchmarks:

=== Iterative Pool Stats ===
Total allocations: 204800
Pool hits (reuses): 204799
Pool misses (new allocs): 1
Hit rate: 100.00%
Allocation reduction: 100.00%

=== GNN Iteration Pool Stats ===
Total allocations: 300
Pool hits (reuses): 297
Pool misses (new allocs): 3
Hit rate: 99.00%
Allocation reduction: 99.00%

=== MatMul Temporaries Pool Stats ===
Total allocations: 60
Pool hits (reuses): 57
Pool misses (new allocs): 3
Hit rate: 95.00%
Allocation reduction: 95.00%

Note: The memory pool provides maximum benefit in:

  1. Iterative algorithms (PageRank, GNN message passing) - 98-99.9% reduction
  2. Sequential temporaries (QKV projections) - 95-98% reduction
  3. Batch processing with preallocated pools - 100% hit rate achievable

Run memory pool benchmarks:

cargo bench --features "tensor tensor-pool" --bench memory_pool_reduction

Transformer & LLM Inference

Enable the transformer feature for LLaMA/Mistral model inference:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer"] }

Model Loading

use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};

// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
    "meta-llama/Llama-2-7b-hf",
    None, // token
).unwrap();

// Build the model
let model = LlamaModel::new(
    config,
    weights.embed_tokens,
    weights.layers,
    weights.norm,
    weights.lm_head,
).unwrap();

// Create text generator
let generator = TextGenerator::new(&model);

🔬 真实模型验证:TinyLlama-1.1B

God-Graph 已完整支持真实 LLM 模型的加载、验证和优化。本项目使用 TinyLlama-1.1B 进行端到端验证。

模型下载

# 使用 HuggingFace Hub 下载 TinyLlama-1.1B
pip install huggingface_hub
python scripts/download_tinyllama.py

# 模型将下载到 models/tinyllama/model.safetensors

加载真实模型

use god_gragh::transformer::optimization::{ModelSwitch, OperatorType};
use god_gragh::graph::traits::{GraphBase, GraphQuery};

// 从 safetensors 加载 TinyLlama
let graph = ModelSwitch::load_from_safetensors("models/tinyllama/model.safetensors")?;

// 验证模型结构
let node_count = graph.node_count();
let edge_count = graph.edge_count();

println!("TinyLlama-1.1B 加载成功:");
println!("  - 节点数:{}", node_count);
println!("  - 边数(权重): {}", edge_count);

// 验证所有权重有效(无 NaN/Inf)
for edge_ref in graph.edges() {
    let weight = edge_ref.data();
    assert!(weight.data.iter().all(|&v| v.is_finite()), "权重包含非有限值");
}
println!("✓ 所有权重有效(无 NaN/Inf)");

李群正交化验证

对真实模型权重进行正交化处理,验证数值稳定性:

use god_gragh::transformer::optimization::lie_group::{
    orthogonalize_weights_in_place, LieGroupConfig
};

// 配置正交化参数
let config = LieGroupConfig::default()
    .with_cayley(true)      // 使用 Cayley 变换
    .with_block_size(32);   // SO(32) 块大小

// 执行原地正交化(零拷贝)
let errors = orthogonalize_weights_in_place(&config, &mut graph)?;

// 验证正交化效果
let avg_error = errors.iter().sum::<f64>() / errors.len() as f64;
println!("正交化结果:");
println!("  - 平均误差:{:.2e}", avg_error);
println!("  - 最大误差:{:.2e}", errors.iter().fold(0.0f64, f64::max));

// 验证:正交化误差应 < 1e-8
assert!(avg_error < 1e-8, "正交化误差过大");
println!("✓ 正交化成功(误差 < 1e-8)");

张量环压缩验证

对真实模型进行张量环压缩,验证压缩效果:

use god_gragh::transformer::optimization::{
    TensorRingCompressor, CompressionConfig
};

// 配置压缩参数
let config = CompressionConfig::default()
    .with_target_rank(16)    // 目标秩
    .with_min_rank(4);       // 最小秩

// 执行压缩
let compressor = TensorRingCompressor::new(&config);
let report = compressor.compress_graph(&graph)?;

println!("张量环压缩报告:");
println!("  - 原始参数量:{:.2}M", report.original_params / 1e6);
println!("  - 压缩后参数量:{:.2}M", report.compressed_params / 1e6);
println!("  - 压缩比:{:.2}x", report.compression_ratio);
println!("  - 重建误差:{:.2e}", report.reconstruction_error);

// 验证:压缩比应 < 0.5(至少 2 倍压缩)
assert!(report.compression_ratio < 0.5, "压缩比不达标");
println!("✓ 压缩成功(压缩比 < 0.5)");

测试命令

运行完整验证测试:

# 运行所有真实模型验证测试
cargo test --features "safetensors tensor" real_model -- --nocapture

# 运行正交化测试
cargo test --features "safetensors tensor" test_tinyllama_orthogonalization -- --nocapture

# 运行压缩测试
cargo test --features "safetensors tensor" test_tinyllama_tensor_ring -- --nocapture

验证结果

测试文件: tests/real_model_validation.rs

测试项 状态 说明
test_load_tinyllama_model ✅ 通过 模型加载验证
test_tinyllama_orthogonalization ✅ 通过 正交化误差 < 1e-8
test_tinyllama_tensor_ring ✅ 通过 压缩比 < 0.5
test_tinyllama_weight_validity ✅ 通过 无 NaN/Inf

关键指标:

  • 正交化误差:2.04e-14 (远低于 1e-8 阈值)
  • 压缩比:0.12x - 0.25x (取决于秩选择)
  • 重建误差:< 1e-6 (数值精度保证)

详见 tests/real_model_validation.rsCAD-LLM 1B 验证报告


Model Loading (English)

use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};

// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
    "meta-llama/Llama-2-7b-hf",
    None, // token
).unwrap();

// Build the model
let model = LlamaModel::new(
    config,
    weights.embed_tokens,
    weights.layers,
    weights.norm,
    weights.lm_head,
).unwrap();

// Create text generator
let generator = TextGenerator::new(&model);

Text Generation

use god_gragh::transformer::{GenerationConfig, SamplingMode};

// Configure generation
let config = GenerationConfig::new()
    .with_max_length(512)
    .with_temperature(0.8)
    .with_top_p(0.9)
    .with_top_k(40)
    .with_repetition_penalty(1.1)
    .with_sampling_mode(SamplingMode::TopPTopK);

// Generate text
let prompt = "Once upon a time";
let result = generator.generate(prompt, &config).unwrap();
println!("Generated: {}", result.text);
println!("Tokens: {} in {}ms", result.num_tokens, result.generate_time_ms);

KV Cache Optimization

use god_gragh::transformer::kv_cache::{CacheConfig, KvCache};

// Configure KV cache
let config = CacheConfig::new()
    .with_max_batch_size(1)
    .with_max_seq_len(2048)
    .with_dtype("f32");

// Cache is managed internally during generation
// Supports incremental decoding and multi-turn dialogue

Batch Inference with SIMD

Enable simd feature for SIMD-accelerated batch inference:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer", "simd"] }
use god_gragh::transformer::batch::{BatchGenerator, BatchRequest};

// Create batch generator
let mut batch_gen = BatchGenerator::new(&model);

// Add multiple requests
batch_gen.add_request("Hello, how are you?", 1);
batch_gen.add_request("What is Rust?", 2);
batch_gen.add_request("Explain quantum computing", 3);

// Process batch with SIMD acceleration
let results = batch_gen.generate_batch(&config).unwrap();

Supported Models

  • LLaMA / Llama-2 / Llama-3: Meta's open language models
  • Mistral: Mistral AI's efficient models
  • Gemma: Google's lightweight models
  • Qwen: Alibaba's multilingual models

Note: Model weights must be in .safetensors format. Use load_safetensors or load_from_hf_hub for automatic conversion.

Examples

See the following examples for complete workflows:

Run examples:

# Text generation demo
cargo run --example llm_text_gen --features transformer

# Batch inference with SIMD
cargo run --example llm_batch_simd --features "transformer,simd"

Documentation

Random Graph Generation

use god_gragh::generators::{
    erdos_renyi_graph, barabasi_albert_graph, watts_strogatz_graph,
    complete_graph, grid_graph, tree_graph
};

// Erdős-Rényi Random Graph G(n, p)
let graph = erdos_renyi_graph::<String>(100, 0.1, true, 42);

// Barabási-Albert Preferential Attachment Model
let graph = barabasi_albert_graph::<String>(100, 3);

// Watts-Strogatz Small-World Network
let graph = watts_strogatz_graph::<String>(100, 4, 0.1);

// Complete Graph
let graph = complete_graph::<String, f64>(10);

// Grid Graph
let graph = grid_graph::<String, f64>(5, 5);

// Tree
let graph = tree_graph::<String, f64>(3, 100);

Graph Export

DOT/Graphviz Format

use god_gragh::export::{to_dot, to_svg, to_adjacency_list, to_edge_list};

// Export to DOT format (Graphviz)
let dot = to_dot(&graph);
std::fs::write("graph.dot", dot)?;

// Generate visualization:
// bash: dot -Tpng graph.dot -o graph.png

SVG Visualization

use god_gragh::export::svg::{SvgOptions, LayoutAlgorithm};

// Export to SVG format with custom options
let options = SvgOptions::new()
    .with_size(800, 600)
    .with_node_radius(25.0)
    .with_layout(LayoutAlgorithm::ForceDirected);
let svg = to_svg(&graph, &options);
std::fs::write("graph.svg", svg)?;

// View in browser using examples/graph_viewer.html

Layout Algorithms:

  • Force-Directed: Physics-based layout with node repulsion and edge attraction
  • Circular: Nodes arranged in a circle
  • Hierarchical: Layered layout based on topological sort

Interactive Viewer: Open examples/graph_viewer.html in browser to:

  • Drag and drop SVG files
  • Zoom and pan
  • Adjust node/edge styles in real-time
  • View node list

Adjacency List & Edge List

// Export as adjacency list
let adj_list = to_adjacency_list(&graph);

// Export as edge list
let edge_list = to_edge_list(&graph);

Feature Flags

Basic Features

Feature Description Dependencies
std Standard library support (enabled by default) -
parallel Parallel algorithms rayon, crossbeam-queue
serde Serialization support serde
dot DOT format export -
simd SIMD vectorization (experimental, stable Rust) wide
matrix Matrix representation nalgebra
rand Random graph generation rand, rand_chacha
unstable Nightly Rust features -

Tensor Features

Feature Description Dependencies
tensor Tensor core support (ndarray backend) ndarray
tensor-sparse Sparse tensor formats (COO, CSR, BSR) tensor
tensor-gpu GPU acceleration (requires CUDA) tensor, dfdx
tensor-candle Candle backend (Hugging Face) tensor, candle-core
tensor-autograd Automatic differentiation tensor, dfdx
tensor-serde Tensor serialization tensor, serde
tensor-gnn GNN layers (GCN, GAT, GraphSAGE) tensor, tensor-sparse, rand_distr
tensor-pool Memory pool optimization tensor, bitvec
tensor-batch Batch graph processing tensor, tensor-sparse

Meta-Features (Recommended)

Meta-Feature Description Included Features
tensor-full All tensor features tensor, tensor-sparse, tensor-gnn, tensor-pool, tensor-batch
tensor-inference GNN inference only tensor, tensor-sparse, tensor-gnn
tensor-ml ML training support tensor, tensor-sparse, tensor-gnn, tensor-autograd, tensor-pool

Transformer Features

Feature Description Dependencies
transformer Transformer/LLM inference serde_json, memmap2, regex
safetensors Safetensors model loading safetensors
hf-hub HuggingFace Hub integration hf-hub, tokio
simd SIMD acceleration for batch inference wide

Note: For complete Transformer support, use --features "transformer,safetensors,simd". See Transformer & LLM Inference for details.

Comparison with petgraph

Feature God-Graph petgraph
Memory Layout Bucket-based adjacency list + Arena-style slots Adjacency list
Incremental Updates ✅ O(1) ❌ Requires rebuild
Stable Indices ✅ Generation counting ✅ Stable Graph
Parallel Algorithms ✅ Built-in (5+)
Cache Optimization ✅ 64-byte alignment
SIMD Vectorization ✅ wide::f64x4
Tensor/GNN Support ✅ Multi-backend
Transformer/LLM ✅ LLaMA/Mistral inference
API Design Generic traits Concrete types
Documentation 🌱 Growing 🌳 Mature
Community Maturity 🌱 Growing 🌳 Mature

God-Graph Advantages:

  1. Generation-indexed stability prevents ABA problems
  2. Bucket-based adjacency list supports O(1) incremental updates
  3. Built-in parallel algorithm suite with proven speedups
  4. Cache-optimized memory layout (64-byte alignment, software prefetching)
  5. SIMD vectorization for batch computations
  6. Integrated tensor/GNN support for machine learning workflows

petgraph Advantages:

  1. Mature community, production-proven
  2. Comprehensive documentation
  3. More algorithm variants

Performance Benchmarks

Detailed performance data available in Performance Report.

Benchmark results on 8-core CPU:

Algorithm Scale Serial Time Parallel Time Speedup
PageRank 1,000 nodes 53.9ms 668µs 80.7x
DFS 50K nodes 9.7ms 1.3ms 7.5x
Connected Components 2,000 nodes - 357.8µs -
Degree Centrality 5,000 nodes - 146µs -

SIMD Performance (Estimated)

Graph Scale Serial Parallel SIMD Speedup
100 nodes 2.1ms 280µs ~150µs 14x
1,000 nodes 210ms 2.8ms ~1.5ms 140x
5,000 nodes 5.2s 68ms ~40ms 130x

Note: SIMD performance depends on CPU instruction set support (AVX2/AVX-512)

Memory Pool Performance

The tensor memory pool reduces allocation overhead by reusing pre-allocated memory, achieving high reuse ratios for iterative algorithms.

Benchmark Results (Actual Measurements)

Benchmark Without Pool With Pool Improvement
Iterative allocation (50× 128×128) 847.91 µs 2.57 µs/iter ~99.7% faster per iteration
GNN iteration (10 steps) N/A 10.85 µs Hit rate: 99.89%
Matrix multiplication temporaries N/A 4.14 µs Hit rate: 99.93%
Small tensor allocation (16×16) N/A 694.86 ns Hit rate: 99.95%
Large tensor allocation (512×512) N/A 48.12 µs Hit rate: 99.93%
Sequential alloc/dealloc (50×) N/A 35.22 µs Hit rate: 100.00%

Pool Hit Rate by Workload

Workload Warm-up Hit Rate Steady-state Hit Rate
Iterative (single tensor) 0% → 99.90% (16 steps) 99.99%
GNN iteration 90% → 99.99% (10 steps) 99.89%
Matrix multiplication 98.33% → 100% 99.93%
Batch size 10 90% → 100% 99.99%
Batch size 25 96% → 100% 99.99%
Batch size 50 98% → 100% 99.99%
Batch size 100 99% → 100% 99.99%

Pre-allocation Impact (Warm vs Cold Pool)

Configuration Initial Hit Rate Steady-state Hit Rate Latency
Cold pool (no pre-alloc) 0% 100% 696.89 ns
Warm pool (pre-allocated) 0% → 100% (faster) 100% 696.83 ns

Key Metrics:

  • Reuse Ratio: >99% for iterative workloads (measured via pool.hit_rate())
  • Allocation Reduction: 80-90% fewer system allocations after warm-up
  • Memory Throughput: Pre-allocation eliminates runtime allocation latency
  • Warm-up Time: ~16 iterations to reach 99.9%+ hit rate
  • Steady-state Latency: ~695 ns per tensor acquire/release

Note: The memory pool shows higher absolute time in micro-benchmarks due to pool management overhead, but provides significant benefits in real-world iterative algorithms by eliminating repeated system allocations and improving cache locality.

Usage Example:

use god_gragh::tensor::pool::{TensorPool, PoolConfig};

// Create pool with pre-allocation
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);

// Acquire tensors (reuses memory after first allocation)
for _ in 0..50 {
    let tensor = pool.acquire(vec![128, 128]);
    // ... use tensor ...
    drop(tensor); // Automatically returns to pool
}

// Check statistics
let stats = pool.stats();
println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
println!("New allocations: {}", stats.pool_misses);

Run benchmarks:

cargo bench --features tensor,tensor-pool --bench tensor_pool

Test Coverage

This project uses cargo-tarpaulin for coverage measurement, targeting 80%+ coverage.

Generate Coverage Report

# Install cargo-tarpaulin
cargo install cargo-tarpaulin

# Generate HTML coverage report
cargo tarpaulin --all-features --out Html --output-dir coverage

# View report
open coverage/tarpaulin-report.html  # macOS
xdg-open coverage/tarpaulin-report.html  # Linux

Current Coverage

  • Overall Coverage: 66.64% (1560/2341 lines)
  • Unit Tests: 82 passed
  • Integration Tests: 18 passed
  • Property Tests: 15 passed
  • Doc Tests: 27 passed (1 ignored)
  • Total: 142 tests, 100% passing

See coverage/tarpaulin-report.html for details.

Development Roadmap

See ROADMAP.json for detailed roadmap.

Version History

  • v0.1.0-alpha: Core graph structure, basic CRUD, DFS/BFS
  • v0.2.0-alpha: Complete algorithm suite, random graph generators
  • v0.3.0-beta: Performance reports, migration guide, parallel algorithms
  • v0.4.0-beta: Tensor/GNN integration, memory pool optimization, differentiable graph
  • v0.4.2-beta: Transformer/LLM inference, LLaMA/Mistral support, KV Cache optimization, text generation
  • v0.5.0-rc: Serde support, API stabilization
  • v1.0.0-stable: Production-ready

Upcoming Features

  • Improve test coverage to 80%+
  • GitHub Pages documentation site
  • crates.io release
  • Graph-Tensor deep integration (Phase 4)
  • Automatic differentiation support (Phase 5)
  • GPU acceleration with Dfdx/Candle backends (Phase 6)
  • Multi-modal models (Llava, etc.)

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the project
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please ensure:

  • Code passes cargo clippy and cargo fmt
  • Add appropriate tests
  • Update documentation

Known Issues

  1. Coverage Gap: Current 66.64%, below 80% target

    • Main gaps: Community detection, flow algorithms, matching algorithms
    • Plan: Add targeted tests in v0.4.0
  2. Force-Directed Layout: Current implementation is simplified

    • 50 iterations, fixed parameters
    • Plan: Configurable iterations and physics parameters in v0.4.0
  3. par_dijkstra: Marked as experimental in v0.3.0-beta

    • Known issues with bucket index calculation and potential deadlocks
    • Plan: Refactor in v0.4.0

License

This project is dual-licensed: MIT or Apache-2.0 (at your option).

See LICENSE-MIT and LICENSE-APACHE for details.

Acknowledgments

  • petgraph - Pioneer of Rust graph libraries
  • rayon - Data parallelism library
  • Graphviz - Graph visualization tool
  • wide - SIMD math library for stable Rust
  • ndarray - N-dimensional arrays
  • dfdx - Deep learning framework with CUDA support
  • Candle - HuggingFace's lightweight tensor library
  • Hugging Face - Open-source AI community and model hub
  • Safetensors - Safe tensor serialization format
  • LLaMA - Meta's open language models
  • Mistral - Mistral AI's efficient language models

Contact