God-Graph

God-Graph 是一个 LLM 白盒分析工具——把 LLM 从黑盒变成可编辑的白盒

核心创新：DifferentiableGraph（可微图结构）——用梯度下降优化神经网络架构，支持动态注意力剪枝、拓扑缺陷检测、自动架构搜索。

🎯 核心定位

God-Graph 不是：

❌ LLM 推理引擎（打不过 llama.cpp）
❌ GNN 训练框架（打不过 DGL/PyG）
❌ 通用图算法库（petgraph 更成熟）

God-Graph 是：

✅ LLM 白盒分析工具——可以检查/修改模型拓扑结构
✅ 可微图结构引擎——用梯度下降优化神经网络架构（DifferentiableGraph）
✅ 拓扑缺陷检测器——发现梯度阻断、孤立节点、缺失残差连接
✅ 数学层面优化器——李群正交化、张量环压缩

一句话总结：God-Graph 用 CAD 软件的设计哲学优化 LLM——检查"表面断裂"（孤立节点）、"非流形几何"（梯度阻断）、"尺寸约束"（注意力头平衡），并首创可微图结构实现梯度引导的架构搜索。

📚 核心文档

完整文档导航: docs/README.md

快速链接

文档	说明
快速开始	5 分钟上手 God-Graph
DifferentiableGraph 教程	可微图结构完整教程
设计哲学	为什么需要 CAD-LLM 范式迁移
架构指南	模块职责和工作流
性能报告	并行算法和 SIMD 性能数据
实现状态	功能完成度和路线图
TinyLlama 验证	真实模型端到端验证

⚡ DifferentiableGraph 快速开始

DifferentiableGraph 是 God-Graph 的核心创新——它将图结构从"静态容器"变为"可微分的计算本身"，支持用梯度下降优化神经网络架构。

核心应用场景

动态注意力剪枝：梯度引导剪除弱注意力边，减少 30-50% 冗余连接
拓扑缺陷检测：自动发现孤立节点、梯度阻断、缺失残差连接
神经架构搜索：让模型自己学习最优残差连接和注意力模式
权重编辑：李群正交化保证数值稳定性，支持精确的权重修改

5 分钟上手示例

use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig, ThresholdEditPolicy};

// 1. 从标准 Transformer 构建可微图
let mut graph = build_mini_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);

// 2. 定义目标函数（注意力熵 + 稀疏性正则）
let loss_fn = |g: &DifferentiableGraph| {
    g.entropy_loss() + 0.1 * g.sparsity_loss()
};

// 3. 梯度下降优化结构
for step in 0..100 {
    let loss = loss_fn(&diff_graph);
    let grads = diff_graph.compute_structure_gradients(loss);
    diff_graph.update_structure(&grads, 0.01);
    
    if step % 10 == 0 {
        println!("Step {}: loss={:.4}", step, loss);
    }
}

// 4. 导出剪枝后的图
let policy = ThresholdEditPolicy::new(0.5);
let pruned_graph = diff_graph.discretize(&policy);
println!("剪枝了 {} 条弱注意力边", pruned_graph.num_pruned_edges());

完整示例

示例	说明	运行命令
可微注意力剪枝	梯度引导剪除弱边	`cargo run --example differentiable_graph --features tensor`
拓扑缺陷检测	检测模型拓扑问题	`cargo run --example cad_llm_validate_1b --features transformer`
李群正交化	权重正交化稳定性	`cargo run --example cad_llm_orthogonalize --features transformer`
张量环压缩	模型压缩 workflow	`cargo run --example cad_llm_tensor_ring --features transformer`

详见 DifferentiableGraph 完整教程。

🚀 快速开始

安装

[dependencies]
god-gragh = "0.5.0-alpha"

基础用法：图数据结构和算法

use god_gragh::graph::Graph;
use god_gragh::algorithms::traversal::{bfs, dfs};

// 创建图
let mut graph = Graph::<String, f64>::directed();
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let _ = graph.add_edge(a, b, 1.0);

// BFS 遍历
bfs(&graph, a, |node, _depth| {
    println!("访问：{}", node.data());
    true
});

高级用法：LLM 拓扑优化

use god_gragh::transformer::optimization::{
    ModelSwitch, CadStyleEditor, TensorRingCompressor
};

// 1. 从 Safetensors 加载模型
let mut graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 2. 检测拓扑缺陷
let mut editor = CadStyleEditor::new(&mut graph);
let defects = editor.detect_defects()?;
println!("发现 {} 个缺陷", defects.len());

// 3. 张量环压缩
let compressor = TensorRingCompressor::default();
let report = compressor.compress_graph(&graph)?;
println!("压缩比：{:.2}x", report.compression_ratio);

// 4. 导出优化后的模型到 Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;

ModelSwitch 双向转换

ModelSwitch 提供 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换：

use god_gragh::transformer::optimization::ModelSwitch;

// 加载：Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 验证拓扑
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效：{}", topology_report.is_valid);

// 验证权重（比较两个图的权重差异）
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异：{:.6e}", weight_diff.max_l2_diff);

// 导出：GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;

功能特性：

✅ 支持 F32/F64/F16 数据类型
✅ 权重精度验证（L2 范数比较）
✅ 拓扑完整性检查
✅ 往返精度损失 < 1e-5

详见 ModelSwitch 示例。

🔬 核心功能

1. ModelSwitch 双向转换 ⭐ 核心功能

ModelSwitch 实现 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换，是 LLM 白盒分析的工作流基础。

use god_gragh::transformer::optimization::ModelSwitch;

// 1. 加载：Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 2. 验证拓扑完整性
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效：{}", topology_report.is_valid);
println!("连通分量：{}", topology_report.connected_components);
println!("是 DAG: {}", topology_report.is_dag);

// 3. 验证权重精度（比较两个图的权重差异）
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异：{:.6e}", weight_diff.max_l2_diff);
println!("平均 L2 差异：{:.6e}", weight_diff.avg_l2_diff);

// 4. 导出：GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;

核心功能：

双向转换：Safetensors ↔ GodGraph 无损转换
数据类型支持：F32、F64、F16 自动转换
拓扑验证：检查连通性、环、孤立节点
权重验证：L2 范数比较，精度损失 < 1e-5
算子推断：根据权重名称自动推断算子类型（Attention、MLP、Norm 等）

运行示例：

cargo run --example cad_llm_switch --features safetensors

详见 ModelSwitch 示例。

2. 可微图结构 (DifferentiableGraph) ⭐ 核心创新

这是 God-Graph 的原创性贡献——将图结构从"静态容器"变为"可微分的计算本身"。

use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig};

// 1. 从标准 Transformer 构建可微图
let mut graph = build_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);

// 2. 梯度下降优化结构
for step in 0..100 {
    let loss = diff_graph.entropy_loss() + 0.1 * diff_graph.sparsity_loss();
    let grads = diff_graph.compute_structure_gradients(loss);
    diff_graph.update_structure(&grads, 0.01);
}

// 3. 导出剪枝后的图
let pruned = diff_graph.discretize(&ThresholdEditPolicy::new(0.5));
println!("剪枝比例：{:.2}%", pruned.pruned_ratio() * 100.0);

核心技术：

连续松弛：将离散的边存在性转换为连续概率（0 到 1）
STE 估计器：Straight-Through Estimator 实现离散 - 连续双向转换
Gumbel-Softmax：可微分采样，支持梯度反向传播
李群正交化：保证权重矩阵数值稳定性

应用场景：

动态注意力剪枝（减少 30-50% 冗余连接）
神经架构搜索（自动发现最优残差连接）
拓扑缺陷检测（孤立节点、梯度阻断）

详见 DifferentiableGraph 教程。

3. 李群正交化 (Lie Group Orthogonalization)

用李群理论保证权重矩阵的正交性，提升数值稳定性。

use god_gragh::tensor::decomposition::{lie_exponential, is_orthogonal};

// so(n) 李代数 → SO(n) 李群
let algebra = DenseTensor::from_vec(
    vec![0.0, -0.1, 0.1, 0.0],
    vec![2, 2],
);

let rotation = lie_exponential(&algebra)?;
assert!(is_orthogonal(&rotation, 1e-5));

数学原理：指数映射 exp: so(n) → SO(n) 用 Padé 近似 + 缩放 - 平方算法实现。

4. 张量环压缩 (Tensor Ring Compression)

把高维张量表示为 3D 核心张量的环，减少参数量。

use god_gragh::transformer::optimization::TensorRingCompressor;

let compressor = TensorRingCompressor::default();
let ring = compressor.decompose(&weight_tensor)?;

println!("压缩比：{:.2}x", ring.compression_ratio());

压缩比公式：(m × n) / (r₀×m×r₁ + r₁×n×r₂)

5. 拓扑约束求解 (Topology Constraint Solving)

像 CAD 软件一样检查 LLM 的"几何完整性"。

use god_gragh::transformer::optimization::{CadStyleEditor, TopologyConstraint};

let mut editor = CadStyleEditor::new(&mut graph);

// 检测缺陷
let defects = editor.detect_defects()?;

// 添加约束
editor.add_constraint(TopologyConstraint::ResidualConnection {
    from_layer: "attention".to_string(),
    to_layer: "output".to_string(),
})?;

// 求解约束（自动修复）
editor.solve_constraints()?;

缺陷类型：孤立节点、disconnected 组件、梯度阻断、缺失残差连接。

6. GraphTransformer 显式注意力分析

定位说明：GraphTransformer 主要用于可视化注意力拓扑、动态剪枝弱边、添加自定义连接。对于高性能推理，建议转换为标准 LlamaModel。

use god_gragh::transformer::graph_transformer::GraphTransformer;

let mut transformer = GraphTransformer::new(12, 12, 768);
transformer.build_graph(&input_ids);

// 可视化注意力拓扑
let dot = transformer.to_dot();
std::fs::write("attention_graph.dot", dot)?;

// 剪枝弱注意力边（阈值=0.01）
let pruned = transformer.prune_weak_edges(0.01);
println!("剪枝 {} 条边", pruned);

// 添加自定义长程连接
transformer.add_skip_connection(layer_0, layer_11);

核心优势：

每条注意力边可单独访问/修改（黑盒推理引擎做不到）
支持动态拓扑编辑（传统静态图做不到）
可导出为 DOT/Graphviz 可视化

📊 性能数据

并行算法加速比

算法	规模	串行时间	并行时间	加速比
PageRank	1,000 节点	53.9ms	668µs	80.7x
DFS	50K 节点	9.7ms	1.3ms	7.5x
Connected Components	2,000 节点	-	357.8µs	-

详见性能报告。

SIMD 优化

图规模	串行	并行	SIMD	提升
100 节点	2.1ms	280µs	~150µs	14x
1,000 节点	210ms	2.8ms	~1.5ms	140x

🏗️ 架构设计

CAD-LLM 范式映射

CAD 概念	LLM 等价物	GodGraph 实现
表面断裂检查	孤立注意力头检测	`connected_components`
非流形几何检查	梯度阻断检测	`topological_sort + path_analysis`
尺寸约束	注意力头权重平衡	`AttentionHeadBalance` 约束
平行约束	残差连接强制	`ResidualConnection` 约束
装配约束	模块接口匹配	`validate_assembly`
零件替换	模块提取/替换	`extract_module` / `replace_module`

详见设计哲学。

📦 特性标志

基础特性

特性	说明
`parallel`	并行算法（Rayon）
`simd`	SIMD 向量化（wide::f64x4）
`tensor`	张量核心支持（ndarray）
`tensor-sparse`	稀疏张量格式（COO/CSR）
`tensor-gnn`	GNN 层（GCN/GAT/GraphSAGE）

LLM 优化特性

特性	说明
`transformer`	Transformer 基础架构
`safetensors`	Safetensors 模型加载
`cad-llm`	CAD-LLM 拓扑优化（实验性）

元特性（推荐）

元特性	包含
`tensor-full`	所有张量功能
`tensor-inference`	GNN 推理专用
`llm`	完整 LLM 支持

🔮 路线图

版本	状态	关键特性
v0.4.3-beta	✅ 已发布	李群正交化、张量环压缩、拓扑约束
v0.5.0-alpha	🔥 当前	DifferentiableGraph 可微图结构、完整模型加载、真实模型验证
v0.6.0-beta	📅 计划	内存池基准测试、GraphTransformer 执行引擎
v0.7.0-rc	📅 计划	生产环境测试、crates.io 发布

v0.5.0-alpha 核心特性

DifferentiableGraph（可微图结构）：1421 行核心代码，支持梯度引导的架构搜索
真实模型验证：TinyLlama-1.1B 端到端优化流程
图级正交化修复：原地正交化接口（零拷贝），误差 < 1e-8
完整示例：5 个端到端 DifferentiableGraph 示例

详见实现状态和 todo.json。

🎓 目标用户

适合使用 God-Graph

✅ LLM 研究人员——想检查和修改模型拓扑结构 ✅ 模型压缩工程师——想用张量环/正交化压缩模型 ✅ QA 团队——想验证模型完整性和数值稳定性 ✅ 算法探索者——想实验动态剪枝、稀疏注意力、架构搜索 ✅ 白盒分析需求——想理解 LLM 内部工作机制

不适合使用 God-Graph

❌ 应用开发者——只想用 LLM 推理（用 llama.cpp） ❌ 训练工程师——想训练新模型（用 PyTorch/JAX） ❌ GPU 加速需求——需要 CUDA 推理（用 candle 或 vllm）

🌟 God-Graph 的独特优势

1. 桶式邻接表 + Generation 索引

O(1) 增量更新：优于静态 CSR 格式，适合动态图编辑场景
防止 ABA 问题：删除节点后重用索引不会混淆（petgraph 没有的类型安全）
64 字节对齐：避免 CPU 缓存 false sharing，推理性能基础

2. DifferentiableGraph（原创创新）

可微图结构：将离散图结构转换为连续可微形式
梯度引导搜索：用梯度下降自动发现最优神经网络架构
STE + Gumbel-Softmax：支持离散 - 连续双向转换，梯度反向传播

3. GraphTransformer 显式注意力

每条边可单独访问/修改：黑盒推理引擎（llama.cpp）做不到
动态拓扑编辑：传统静态图（petgraph）做不到
可视化支持：导出 DOT/Graphviz 格式，直观理解注意力模式

4. ModelSwitch 双向转换工作流

Safetensors ↔ GodGraph：HuggingFace 格式双向转换
权重精度验证：L2 范数比较，往返损失 < 1e-5
拓扑完整性检查：自动检测孤立节点、梯度阻断
算子类型推断：根据权重名称识别 Attention、MLP、Norm 等

5. 李群正交化 + 张量环压缩

数学保证：李群理论保证权重矩阵正交性，数值稳定性
压缩比：张量环分解减少 2-10x 参数量
端到端工作流：Safetensors ↔ GodGraph ↔ Safetensors

🤝 贡献

欢迎贡献！请确保：

代码通过 cargo clippy 和 cargo fmt
添加适当的测试
更新文档

📄 许可证

双许可证：MIT 或 Apache-2.0（任选其一）

🙏 致谢

petgraph - Rust 图算法库先驱
ndarray - N 维数组
wide - SIMD 数学库
HuggingFace - Safetensors 格式

联系方式: silverenternal 3147264070@qq.com 项目地址: https://github.com/silverenternal/god-graph

Quick Start

Installation

Add dependency to Cargo.toml:

[dependencies]
god-gragh = "0.4.2-beta"

Basic Usage

use god_gragh::graph::Graph;
use god_gragh::graph::traits::{GraphOps, GraphQuery};

// Create a directed graph
let mut graph = Graph::<String, f64>::directed();

// Add nodes
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let c = graph.add_node("C".to_string()).unwrap();

// Add edges
graph.add_edge(a, b, 1.0).unwrap();
graph.add_edge(b, c, 2.0).unwrap();
graph.add_edge(a, c, 3.0).unwrap();

// Query
println!("Nodes: {}", graph.node_count());
println!("Edges: {}", graph.edge_count());

// Iterate over neighbors
for neighbor in graph.neighbors(a) {
    println!("Neighbor: {}", graph[neighbor]);
}

Using Graph Builder

use god_gragh::graph::builders::GraphBuilder;

let graph = GraphBuilder::directed()
    .with_nodes(vec!["A", "B", "C", "D"])
    .with_edges(vec![
        (0, 1, 1.0),
        (0, 2, 2.0),
        (1, 3, 3.0),
        (2, 3, 4.0),
    ])
    .build()
    .unwrap();

Algorithms

Traversal Algorithms

use god_gragh::algorithms::traversal::{dfs, bfs, topological_sort, tarjan_scc};

// Depth-First Search
dfs(&graph, start_node, |node| {
    println!("Visit: {}", node.data());
    true // Continue traversal
});

// Breadth-First Search
bfs(&graph, start_node, |node| {
    println!("Visit: {}", node.data());
    true
});

// Topological Sort (DAG)
let order = topological_sort(&graph);

// Tarjan's Strongly Connected Components
let sccs = tarjan_scc(&graph);

Shortest Path Algorithms

use god_gragh::algorithms::shortest_path::{dijkstra, bellman_ford, floyd_warshall, astar};

// Dijkstra's Algorithm (non-negative weights)
let (path, distance) = dijkstra(&graph, start, Some(end)).unwrap();

// A* Search
let heuristic = |node: NodeIndex| -> f64 { /* Heuristic function */ 0.0 };
let (path, distance) = astar(&graph, start, end, heuristic).unwrap();

// Bellman-Ford (handles negative weights)
let distances = bellman_ford(&graph, start);

// Floyd-Warshall (all-pairs shortest paths)
let distances = floyd_warshall(&graph);

Minimum Spanning Tree

use god_gragh::algorithms::mst::{kruskal, prim};

// Kruskal's Algorithm
let mst = kruskal(&graph);

// Prim's Algorithm
let mst = prim(&graph, start_node);

Centrality Algorithms

use god_gragh::algorithms::centrality::{
    degree_centrality, betweenness_centrality, closeness_centrality, pagerank
};

// Degree Centrality
let centrality = degree_centrality(&graph);

// Betweenness Centrality
let centrality = betweenness_centrality(&graph);

// Closeness Centrality
let centrality = closeness_centrality(&graph);

// PageRank
let ranks = pagerank(&graph, 0.85, 20);

Community Detection

use god_gragh::algorithms::community::{connected_components, label_propagation};

// Connected Components
let components = connected_components(&graph);

// Label Propagation Algorithm
let communities = label_propagation(&graph);

Flow Algorithms

use god_gragh::algorithms::flow::{edmonds_karp, dinic, push_relabel};

// Edmonds-Karp Maximum Flow
let (flow, residual_graph) = edmonds_karp(&graph, source, sink);

// Dinic's Algorithm
let flow = dinic(&graph, source, sink);

// Push-Relabel Algorithm
let flow = push_relabel(&graph, source, sink);

Parallel Algorithms

Enable parallel feature to use parallel algorithms:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["parallel"] }

use god_gragh::algorithms::parallel;

// Parallel BFS
let layers = parallel::bfs_parallel(&graph, start_node);

// Parallel PageRank
let ranks = parallel::pagerank_parallel(&graph, 0.85, 20);

// Parallel Connected Components
let components = parallel::connected_components_parallel(&graph);

SIMD Optimization

Enable simd feature for SIMD vectorization (supports stable Rust):

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["simd"] }

use god_gragh::algorithms::parallel;

// SIMD-accelerated PageRank
#[cfg(feature = "simd")]
let ranks = parallel::par_pagerank_simd(&graph, 0.85, 20);

// SIMD-accelerated Degree Centrality
#[cfg(feature = "simd")]
let centrality = parallel::par_degree_centrality_simd(&graph);

Implementation Details: Uses wide::f64x4 type for 4-way parallel floating-point operations, automatically leveraging CPU SIMD instruction sets (SSE/AVX/AVX-512).

Tensor & GNN Support

Enable tensor features for Graph Neural Network workflows:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["tensor", "tensor-gnn"] }

Basic Tensor Operations

use god_gragh::tensor::{DenseTensor, TensorBase, TensorOps};

// Create tensors
let a = DenseTensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = DenseTensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);

// Matrix multiplication
let c = a.matmul(&b);

// Transpose
let t = a.transpose(None);

// Normalize
let norm = a.normalize();

Graph-Tensor Conversion

use god_gragh::graph::Graph;
use god_gragh::tensor::GraphTensorExt;

// Create a graph with vector node features
let mut graph = Graph::<Vec<f64>, f64>::directed();

let n0 = graph.add_node(vec![1.0, 0.0]).unwrap();
let n1 = graph.add_node(vec![0.0, 1.0]).unwrap();
let n2 = graph.add_node(vec![1.0, 1.0]).unwrap();

let _ = graph.add_edge(n0, n1, 1.0);
let _ = graph.add_edge(n1, n2, 1.0);
let _ = graph.add_edge(n2, n0, 1.0);

// Convert to tensor representation
let (features, adjacency) = graph.to_tensor_representation().unwrap();

assert_eq!(features.shape(), &[3, 2]);
assert_eq!(adjacency.num_nodes, 3);

GNN Layers

Important: God-Graph GNN modules are inference-only (forward pass only). For training workflows, integrate with external autograd libraries:

dfdx: Deep learning framework with CUDA support

Candle: HuggingFace's lightweight tensor library

tch-rs: Rust bindings for PyTorch

Inference Example (Recommended Use Case)

use god_gragh::tensor::gnn::{GCNConv, GATConv, GraphSAGE, MessagePassingLayer};

// Create GCN layer
let gcn = GCNConv::new(64, 64);

// Create GAT layer (multi-head attention)
let gat = GATConv::new(
    64,  // in_features
    64,  // out_features
    4,   // num_heads
);

// Create GraphSAGE layer
let graphsage = GraphSAGE::new(
    64,  // in_features
    32,  // out_features
    10,  // num_samples
);

// Forward pass (inference only)
let h1 = gcn.forward(&features, &adjacency);
let h2 = gat.forward(&h1, &edge_index);
let output = graphsage.forward(&h2, &edge_index);

Training Integration Example (with dfdx)

For complete GNN training, integrate with dfdx:

// Pseudo-code: Integrate god-gragh GNN with dfdx autograd
use dfdx::prelude::*;
use god_gragh::tensor::gnn::GCNConv;

// 1. Use god-gragh for graph structure and forward pass
let gcn = GCNConv::new(64, 64);
let output = gcn.forward(&features, &adjacency);

// 2. Convert to dfdx tensor for autograd
// let dfdx_tensor = Tensor1D::from(output.data());

// 3. Define loss and optimizer (dfdx)
// let loss = cross_entropy_loss(&dfdx_tensor, &labels);
// let mut optimizer = Adam::new(model.parameters(), lr=0.001);

// 4. Training loop
// for epoch in 0..num_epochs {
//     optimizer.zero_grad();
//     let loss = forward_pass(&graph, &labels);
//     optimizer.backward(&loss);
//     optimizer.step();
// }

See: examples/differentiable_graph.rs for an example of differentiable graph structures and gradient-based optimization.

Memory Pool Optimization

use god_gragh::tensor::{TensorPool, PoolConfig};

// Create a tensor pool
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);

// Acquire tensor from pool (automatically zeroed)
let tensor = pool.acquire(vec![100, 100]);

// Automatically returned to pool when dropped
drop(tensor);

Benefits:

Memory Reuse: Reduces allocation overhead in iterative algorithms (PageRank, GNN training) by 80-90%
Automatic Recycling: PooledTensor automatically returns to pool on Drop
Gradient Checkpointing: GradientCheckpoint reduces memory usage during backpropagation by 40-60%

Memory Pool Benchmark Results

Latest benchmarks (run on Linux, Rust 1.85, March 2026):

Benchmark	Time	Pool Hit Rate	Allocation Reduction
Iterative (without pool)	850.84 µs	N/A	Baseline
Iterative (with pool)	127.76 µs	98-100%	98-99.9%
GNN Iteration	31.93 µs	96-99%	96-99%
MatMul Temporaries	42.15 µs	95-98%	95-98%
Small Tensors (16x16)	6.89 µs	98%+	98%+
Large Tensors (512x512)	17.36 µs	95%+	95%+
Sequential Alloc/Dealloc	34.71 µs	98%+	98%+
Warm Pool (preallocated)	34.39 µs	100%	100%
Cold Pool (no prealloc)	35.32 µs	98%+	98%+

Key Findings:

Allocation Reduction: The memory pool achieves 98-99.9% reduction in new system allocations for iterative workloads, validating the "80-90% reduction" claim with actual measurements showing even better results.
Performance Speedup: For iterative allocation patterns (50 iterations of 128x128 tensors), the pool achieves 6.7x speedup (850.84 µs → 127.76 µs).
Preallocation Benefit: Warm pools with preallocation achieve near-perfect hit rates (100%), eliminating allocation overhead entirely.
Typical Workloads:
- GNN Forward Pass: 96-99% allocation reduction (hidden state temporaries)
- Attention QKV Projections: 95-98% reduction (sequential matmul temporaries)
- Batch Processing: 98%+ reduction with preallocated pools

Pool Statistics from Benchmarks:

=== Iterative Pool Stats ===
Total allocations: 204800
Pool hits (reuses): 204799
Pool misses (new allocs): 1
Hit rate: 100.00%
Allocation reduction: 100.00%

=== GNN Iteration Pool Stats ===
Total allocations: 300
Pool hits (reuses): 297
Pool misses (new allocs): 3
Hit rate: 99.00%
Allocation reduction: 99.00%

=== MatMul Temporaries Pool Stats ===
Total allocations: 60
Pool hits (reuses): 57
Pool misses (new allocs): 3
Hit rate: 95.00%
Allocation reduction: 95.00%

Note: The memory pool provides maximum benefit in:

Iterative algorithms (PageRank, GNN message passing) - 98-99.9% reduction
Sequential temporaries (QKV projections) - 95-98% reduction
Batch processing with preallocated pools - 100% hit rate achievable

Run memory pool benchmarks:

cargo bench --features "tensor tensor-pool" --bench memory_pool_reduction

Transformer & LLM Inference

Enable the transformer feature for LLaMA/Mistral model inference:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer"] }

Model Loading

use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};

// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
    "meta-llama/Llama-2-7b-hf",
    None, // token
).unwrap();

// Build the model
let model = LlamaModel::new(
    config,
    weights.embed_tokens,
    weights.layers,
    weights.norm,
    weights.lm_head,
).unwrap();

// Create text generator
let generator = TextGenerator::new(&model);

🔬 真实模型验证：TinyLlama-1.1B

God-Graph 已完整支持真实 LLM 模型的加载、验证和优化。本项目使用 TinyLlama-1.1B 进行端到端验证。

模型下载

# 使用 HuggingFace Hub 下载 TinyLlama-1.1B
pip install huggingface_hub
python scripts/download_tinyllama.py

# 模型将下载到 models/tinyllama/model.safetensors

加载真实模型

use god_gragh::transformer::optimization::{ModelSwitch, OperatorType};
use god_gragh::graph::traits::{GraphBase, GraphQuery};

// 从 safetensors 加载 TinyLlama
let graph = ModelSwitch::load_from_safetensors("models/tinyllama/model.safetensors")?;

// 验证模型结构
let node_count = graph.node_count();
let edge_count = graph.edge_count();

println!("TinyLlama-1.1B 加载成功:");
println!("  - 节点数：{}", node_count);
println!("  - 边数（权重）: {}", edge_count);

// 验证所有权重有效（无 NaN/Inf）
for edge_ref in graph.edges() {
    let weight = edge_ref.data();
    assert!(weight.data.iter().all(|&v| v.is_finite()), "权重包含非有限值");
}
println!("✓ 所有权重有效（无 NaN/Inf）");

李群正交化验证

对真实模型权重进行正交化处理，验证数值稳定性：

use god_gragh::transformer::optimization::lie_group::{
    orthogonalize_weights_in_place, LieGroupConfig
};

// 配置正交化参数
let config = LieGroupConfig::default()
    .with_cayley(true)      // 使用 Cayley 变换
    .with_block_size(32);   // SO(32) 块大小

// 执行原地正交化（零拷贝）
let errors = orthogonalize_weights_in_place(&config, &mut graph)?;

// 验证正交化效果
let avg_error = errors.iter().sum::<f64>() / errors.len() as f64;
println!("正交化结果:");
println!("  - 平均误差：{:.2e}", avg_error);
println!("  - 最大误差：{:.2e}", errors.iter().fold(0.0f64, f64::max));

// 验证：正交化误差应 < 1e-8
assert!(avg_error < 1e-8, "正交化误差过大");
println!("✓ 正交化成功（误差 < 1e-8）");

张量环压缩验证

对真实模型进行张量环压缩，验证压缩效果：

use god_gragh::transformer::optimization::{
    TensorRingCompressor, CompressionConfig
};

// 配置压缩参数
let config = CompressionConfig::default()
    .with_target_rank(16)    // 目标秩
    .with_min_rank(4);       // 最小秩

// 执行压缩
let compressor = TensorRingCompressor::new(&config);
let report = compressor.compress_graph(&graph)?;

println!("张量环压缩报告:");
println!("  - 原始参数量：{:.2}M", report.original_params / 1e6);
println!("  - 压缩后参数量：{:.2}M", report.compressed_params / 1e6);
println!("  - 压缩比：{:.2}x", report.compression_ratio);
println!("  - 重建误差：{:.2e}", report.reconstruction_error);

// 验证：压缩比应 < 0.5（至少 2 倍压缩）
assert!(report.compression_ratio < 0.5, "压缩比不达标");
println!("✓ 压缩成功（压缩比 < 0.5）");

测试命令

运行完整验证测试：

# 运行所有真实模型验证测试
cargo test --features "safetensors tensor" real_model -- --nocapture

# 运行正交化测试
cargo test --features "safetensors tensor" test_tinyllama_orthogonalization -- --nocapture

# 运行压缩测试
cargo test --features "safetensors tensor" test_tinyllama_tensor_ring -- --nocapture

验证结果

测试文件: tests/real_model_validation.rs

测试项	状态	说明
`test_load_tinyllama_model`	✅ 通过	模型加载验证
`test_tinyllama_orthogonalization`	✅ 通过	正交化误差 < 1e-8
`test_tinyllama_tensor_ring`	✅ 通过	压缩比 < 0.5
`test_tinyllama_weight_validity`	✅ 通过	无 NaN/Inf

关键指标:

正交化误差：2.04e-14 (远低于 1e-8 阈值)
压缩比：0.12x - 0.25x (取决于秩选择)
重建误差：< 1e-6 (数值精度保证)

详见 tests/real_model_validation.rs 和 CAD-LLM 1B 验证报告。

Model Loading (English)

use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};

// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
    "meta-llama/Llama-2-7b-hf",
    None, // token
).unwrap();

// Build the model
let model = LlamaModel::new(
    config,
    weights.embed_tokens,
    weights.layers,
    weights.norm,
    weights.lm_head,
).unwrap();

// Create text generator
let generator = TextGenerator::new(&model);

Text Generation

use god_gragh::transformer::{GenerationConfig, SamplingMode};

// Configure generation
let config = GenerationConfig::new()
    .with_max_length(512)
    .with_temperature(0.8)
    .with_top_p(0.9)
    .with_top_k(40)
    .with_repetition_penalty(1.1)
    .with_sampling_mode(SamplingMode::TopPTopK);

// Generate text
let prompt = "Once upon a time";
let result = generator.generate(prompt, &config).unwrap();
println!("Generated: {}", result.text);
println!("Tokens: {} in {}ms", result.num_tokens, result.generate_time_ms);

KV Cache Optimization

use god_gragh::transformer::kv_cache::{CacheConfig, KvCache};

// Configure KV cache
let config = CacheConfig::new()
    .with_max_batch_size(1)
    .with_max_seq_len(2048)
    .with_dtype("f32");

// Cache is managed internally during generation
// Supports incremental decoding and multi-turn dialogue

Batch Inference with SIMD

Enable simd feature for SIMD-accelerated batch inference:

[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer", "simd"] }

use god_gragh::transformer::batch::{BatchGenerator, BatchRequest};

// Create batch generator
let mut batch_gen = BatchGenerator::new(&model);

// Add multiple requests
batch_gen.add_request("Hello, how are you?", 1);
batch_gen.add_request("What is Rust?", 2);
batch_gen.add_request("Explain quantum computing", 3);

// Process batch with SIMD acceleration
let results = batch_gen.generate_batch(&config).unwrap();

Supported Models

LLaMA / Llama-2 / Llama-3: Meta's open language models
Mistral: Mistral AI's efficient models
Gemma: Google's lightweight models
Qwen: Alibaba's multilingual models

Note: Model weights must be in .safetensors format. Use load_safetensors or load_from_hf_hub for automatic conversion.

Examples

See the following examples for complete workflows:

examples/llm_model_loader.rs - Loading models from HuggingFace
examples/llm_text_gen.rs - End-to-end text generation
examples/llm_batch_simd.rs - SIMD-accelerated batch inference

Run examples:

# Text generation demo
cargo run --example llm_text_gen --features transformer

# Batch inference with SIMD
cargo run --example llm_batch_simd --features "transformer,simd"

Documentation

Transformer Module Guide - API reference and usage
Transformer Tutorial - Step-by-step tutorial
Enhancements Report - Implementation details

Random Graph Generation

use god_gragh::generators::{
    erdos_renyi_graph, barabasi_albert_graph, watts_strogatz_graph,
    complete_graph, grid_graph, tree_graph
};

// Erdős-Rényi Random Graph G(n, p)
let graph = erdos_renyi_graph::<String>(100, 0.1, true, 42);

// Barabási-Albert Preferential Attachment Model
let graph = barabasi_albert_graph::<String>(100, 3);

// Watts-Strogatz Small-World Network
let graph = watts_strogatz_graph::<String>(100, 4, 0.1);

// Complete Graph
let graph = complete_graph::<String, f64>(10);

// Grid Graph
let graph = grid_graph::<String, f64>(5, 5);

// Tree
let graph = tree_graph::<String, f64>(3, 100);

Graph Export

DOT/Graphviz Format

use god_gragh::export::{to_dot, to_svg, to_adjacency_list, to_edge_list};

// Export to DOT format (Graphviz)
let dot = to_dot(&graph);
std::fs::write("graph.dot", dot)?;

// Generate visualization:
// bash: dot -Tpng graph.dot -o graph.png

SVG Visualization

use god_gragh::export::svg::{SvgOptions, LayoutAlgorithm};

// Export to SVG format with custom options
let options = SvgOptions::new()
    .with_size(800, 600)
    .with_node_radius(25.0)
    .with_layout(LayoutAlgorithm::ForceDirected);
let svg = to_svg(&graph, &options);
std::fs::write("graph.svg", svg)?;

// View in browser using examples/graph_viewer.html

Layout Algorithms:

Force-Directed: Physics-based layout with node repulsion and edge attraction
Circular: Nodes arranged in a circle
Hierarchical: Layered layout based on topological sort

Interactive Viewer: Open examples/graph_viewer.html in browser to:

Drag and drop SVG files
Zoom and pan
Adjust node/edge styles in real-time
View node list

Adjacency List & Edge List

// Export as adjacency list
let adj_list = to_adjacency_list(&graph);

// Export as edge list
let edge_list = to_edge_list(&graph);

Feature Flags

Basic Features

Feature	Description	Dependencies
`std`	Standard library support (enabled by default)	-
`parallel`	Parallel algorithms	rayon, crossbeam-queue
`serde`	Serialization support	serde
`dot`	DOT format export	-
`simd`	SIMD vectorization (experimental, stable Rust)	wide
`matrix`	Matrix representation	nalgebra
`rand`	Random graph generation	rand, rand_chacha
`unstable`	Nightly Rust features	-

Tensor Features

Feature	Description	Dependencies
`tensor`	Tensor core support (ndarray backend)	ndarray
`tensor-sparse`	Sparse tensor formats (COO, CSR, BSR)	tensor
`tensor-gpu`	GPU acceleration (requires CUDA)	tensor, dfdx
`tensor-candle`	Candle backend (Hugging Face)	tensor, candle-core
`tensor-autograd`	Automatic differentiation	tensor, dfdx
`tensor-serde`	Tensor serialization	tensor, serde
`tensor-gnn`	GNN layers (GCN, GAT, GraphSAGE)	tensor, tensor-sparse, rand_distr
`tensor-pool`	Memory pool optimization	tensor, bitvec
`tensor-batch`	Batch graph processing	tensor, tensor-sparse

Meta-Features (Recommended)

Meta-Feature	Description	Included Features
`tensor-full`	All tensor features	tensor, tensor-sparse, tensor-gnn, tensor-pool, tensor-batch
`tensor-inference`	GNN inference only	tensor, tensor-sparse, tensor-gnn
`tensor-ml`	ML training support	tensor, tensor-sparse, tensor-gnn, tensor-autograd, tensor-pool

Transformer Features

Feature	Description	Dependencies
`transformer`	Transformer/LLM inference	serde_json, memmap2, regex
`safetensors`	Safetensors model loading	safetensors
`hf-hub`	HuggingFace Hub integration	hf-hub, tokio
`simd`	SIMD acceleration for batch inference	wide

Note: For complete Transformer support, use --features "transformer,safetensors,simd". See Transformer & LLM Inference for details.

Comparison with petgraph

Feature	God-Graph	petgraph
Memory Layout	Bucket-based adjacency list + Arena-style slots	Adjacency list
Incremental Updates	✅ O(1)	❌ Requires rebuild
Stable Indices	✅ Generation counting	✅ Stable Graph
Parallel Algorithms	✅ Built-in (5+)	❌
Cache Optimization	✅ 64-byte alignment	❌
SIMD Vectorization	✅ wide::f64x4	❌
Tensor/GNN Support	✅ Multi-backend	❌
Transformer/LLM	✅ LLaMA/Mistral inference	❌
API Design	Generic traits	Concrete types
Documentation	🌱 Growing	🌳 Mature
Community Maturity	🌱 Growing	🌳 Mature

God-Graph Advantages:

Generation-indexed stability prevents ABA problems
Bucket-based adjacency list supports O(1) incremental updates
Built-in parallel algorithm suite with proven speedups
Cache-optimized memory layout (64-byte alignment, software prefetching)
SIMD vectorization for batch computations
Integrated tensor/GNN support for machine learning workflows

petgraph Advantages:

Mature community, production-proven
Comprehensive documentation
More algorithm variants

Performance Benchmarks

Detailed performance data available in Performance Report.

Benchmark results on 8-core CPU:

Algorithm	Scale	Serial Time	Parallel Time	Speedup
PageRank	1,000 nodes	53.9ms	668µs	80.7x
DFS	50K nodes	9.7ms	1.3ms	7.5x
Connected Components	2,000 nodes	-	357.8µs	-
Degree Centrality	5,000 nodes	-	146µs	-

SIMD Performance (Estimated)

Graph Scale	Serial	Parallel	SIMD	Speedup
100 nodes	2.1ms	280µs	~150µs	14x
1,000 nodes	210ms	2.8ms	~1.5ms	140x
5,000 nodes	5.2s	68ms	~40ms	130x

Note: SIMD performance depends on CPU instruction set support (AVX2/AVX-512)

Memory Pool Performance

The tensor memory pool reduces allocation overhead by reusing pre-allocated memory, achieving high reuse ratios for iterative algorithms.

Benchmark Results (Actual Measurements)

Benchmark	Without Pool	With Pool	Improvement
Iterative allocation (50× 128×128)	847.91 µs	2.57 µs/iter	~99.7% faster per iteration
GNN iteration (10 steps)	N/A	10.85 µs	Hit rate: 99.89%
Matrix multiplication temporaries	N/A	4.14 µs	Hit rate: 99.93%
Small tensor allocation (16×16)	N/A	694.86 ns	Hit rate: 99.95%
Large tensor allocation (512×512)	N/A	48.12 µs	Hit rate: 99.93%
Sequential alloc/dealloc (50×)	N/A	35.22 µs	Hit rate: 100.00%

Pool Hit Rate by Workload

Workload	Warm-up Hit Rate	Steady-state Hit Rate
Iterative (single tensor)	0% → 99.90% (16 steps)	99.99%
GNN iteration	90% → 99.99% (10 steps)	99.89%
Matrix multiplication	98.33% → 100%	99.93%
Batch size 10	90% → 100%	99.99%
Batch size 25	96% → 100%	99.99%
Batch size 50	98% → 100%	99.99%
Batch size 100	99% → 100%	99.99%

Pre-allocation Impact (Warm vs Cold Pool)

Configuration	Initial Hit Rate	Steady-state Hit Rate	Latency
Cold pool (no pre-alloc)	0%	100%	696.89 ns
Warm pool (pre-allocated)	0% → 100% (faster)	100%	696.83 ns

Key Metrics:

Reuse Ratio: >99% for iterative workloads (measured via pool.hit_rate())
Allocation Reduction: 80-90% fewer system allocations after warm-up
Memory Throughput: Pre-allocation eliminates runtime allocation latency
Warm-up Time: ~16 iterations to reach 99.9%+ hit rate
Steady-state Latency: ~695 ns per tensor acquire/release

Note: The memory pool shows higher absolute time in micro-benchmarks due to pool management overhead, but provides significant benefits in real-world iterative algorithms by eliminating repeated system allocations and improving cache locality.

Usage Example:

use god_gragh::tensor::pool::{TensorPool, PoolConfig};

// Create pool with pre-allocation
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);

// Acquire tensors (reuses memory after first allocation)
for _ in 0..50 {
    let tensor = pool.acquire(vec![128, 128]);
    // ... use tensor ...
    drop(tensor); // Automatically returns to pool
}

// Check statistics
let stats = pool.stats();
println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
println!("New allocations: {}", stats.pool_misses);

Run benchmarks:

cargo bench --features tensor,tensor-pool --bench tensor_pool

Test Coverage

This project uses cargo-tarpaulin for coverage measurement, targeting 80%+ coverage.

Generate Coverage Report

# Install cargo-tarpaulin
cargo install cargo-tarpaulin

# Generate HTML coverage report
cargo tarpaulin --all-features --out Html --output-dir coverage

# View report
open coverage/tarpaulin-report.html  # macOS
xdg-open coverage/tarpaulin-report.html  # Linux

Current Coverage

Overall Coverage: 66.64% (1560/2341 lines)
Unit Tests: 82 passed
Integration Tests: 18 passed
Property Tests: 15 passed
Doc Tests: 27 passed (1 ignored)
Total: 142 tests, 100% passing

See coverage/tarpaulin-report.html for details.

Development Roadmap

See ROADMAP.json for detailed roadmap.

Version History

v0.1.0-alpha: Core graph structure, basic CRUD, DFS/BFS
v0.2.0-alpha: Complete algorithm suite, random graph generators
v0.3.0-beta: Performance reports, migration guide, parallel algorithms
v0.4.0-beta: Tensor/GNN integration, memory pool optimization, differentiable graph
v0.4.2-beta: Transformer/LLM inference, LLaMA/Mistral support, KV Cache optimization, text generation
v0.5.0-rc: Serde support, API stabilization
v1.0.0-stable: Production-ready

Upcoming Features

Improve test coverage to 80%+
GitHub Pages documentation site
crates.io release
Graph-Tensor deep integration (Phase 4)
Automatic differentiation support (Phase 5)
GPU acceleration with Dfdx/Candle backends (Phase 6)
Multi-modal models (Llava, etc.)

Contributing

Contributions are welcome! Please follow these steps:

Fork the project
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please ensure:

Code passes cargo clippy and cargo fmt
Add appropriate tests
Update documentation

Known Issues

Coverage Gap: Current 66.64%, below 80% target
- Main gaps: Community detection, flow algorithms, matching algorithms
- Plan: Add targeted tests in v0.4.0
Force-Directed Layout: Current implementation is simplified
- 50 iterations, fixed parameters
- Plan: Configurable iterations and physics parameters in v0.4.0
par_dijkstra: Marked as experimental in v0.3.0-beta
- Known issues with bucket index calculation and potential deadlocks
- Plan: Refactor in v0.4.0

License

This project is dual-licensed: MIT or Apache-2.0 (at your option).

See LICENSE-MIT and LICENSE-APACHE for details.

Acknowledgments

petgraph - Pioneer of Rust graph libraries
rayon - Data parallelism library
Graphviz - Graph visualization tool
wide - SIMD math library for stable Rust
ndarray - N-dimensional arrays
dfdx - Deep learning framework with CUDA support
Candle - HuggingFace's lightweight tensor library
Hugging Face - Open-source AI community and model hub
Safetensors - Safe tensor serialization format
LLaMA - Meta's open language models
Mistral - Mistral AI's efficient language models

Contact

Issue Reports: GitHub Issues
Discussions: GitHub Discussions
Documentation: docs.rs/god-gragh

god-gragh 0.5.0