God-Graph
God-Graph 是一个 LLM 白盒分析工具——把 LLM 从黑盒变成可编辑的白盒
核心创新:DifferentiableGraph(可微图结构)——用梯度下降优化神经网络架构,支持动态注意力剪枝、拓扑缺陷检测、自动架构搜索。
🎯 核心定位
God-Graph 不是:
- ❌ LLM 推理引擎(打不过
llama.cpp) - ❌ GNN 训练框架(打不过 DGL/PyG)
- ❌ 通用图算法库(
petgraph更成熟)
God-Graph 是:
- ✅ LLM 白盒分析工具——可以检查/修改模型拓扑结构
- ✅ 可微图结构引擎——用梯度下降优化神经网络架构(DifferentiableGraph)
- ✅ 拓扑缺陷检测器——发现梯度阻断、孤立节点、缺失残差连接
- ✅ 数学层面优化器——李群正交化、张量环压缩
一句话总结:God-Graph 用 CAD 软件的设计哲学优化 LLM——检查"表面断裂"(孤立节点)、"非流形几何"(梯度阻断)、"尺寸约束"(注意力头平衡),并首创可微图结构实现梯度引导的架构搜索。
📚 核心文档
完整文档导航: docs/README.md
快速链接
| 文档 | 说明 |
|---|---|
| 快速开始 | 5 分钟上手 God-Graph |
| DifferentiableGraph 教程 | 可微图结构完整教程 |
| 设计哲学 | 为什么需要 CAD-LLM 范式迁移 |
| 架构指南 | 模块职责和工作流 |
| 性能报告 | 并行算法和 SIMD 性能数据 |
| 实现状态 | 功能完成度和路线图 |
| TinyLlama 验证 | 真实模型端到端验证 |
⚡ DifferentiableGraph 快速开始
DifferentiableGraph 是 God-Graph 的核心创新——它将图结构从"静态容器"变为"可微分的计算本身",支持用梯度下降优化神经网络架构。
核心应用场景
- 动态注意力剪枝:梯度引导剪除弱注意力边,减少 30-50% 冗余连接
- 拓扑缺陷检测:自动发现孤立节点、梯度阻断、缺失残差连接
- 神经架构搜索:让模型自己学习最优残差连接和注意力模式
- 权重编辑:李群正交化保证数值稳定性,支持精确的权重修改
5 分钟上手示例
use ;
// 1. 从标准 Transformer 构建可微图
let mut graph = build_mini_transformer;
let config = default.with_sparsity;
let mut diff_graph = from_graph;
// 2. 定义目标函数(注意力熵 + 稀疏性正则)
let loss_fn = ;
// 3. 梯度下降优化结构
for step in 0..100
// 4. 导出剪枝后的图
let policy = new;
let pruned_graph = diff_graph.discretize;
println!;
完整示例
| 示例 | 说明 | 运行命令 |
|---|---|---|
| 可微注意力剪枝 | 梯度引导剪除弱边 | cargo run --example differentiable_graph --features tensor |
| 拓扑缺陷检测 | 检测模型拓扑问题 | cargo run --example cad_llm_validate_1b --features transformer |
| 李群正交化 | 权重正交化稳定性 | cargo run --example cad_llm_orthogonalize --features transformer |
| 张量环压缩 | 模型压缩 workflow | cargo run --example cad_llm_tensor_ring --features transformer |
🚀 快速开始
安装
[]
= "0.5.0-alpha"
基础用法:图数据结构和算法
use Graph;
use ;
// 创建图
let mut graph = directed;
let a = graph.add_node.unwrap;
let b = graph.add_node.unwrap;
let _ = graph.add_edge;
// BFS 遍历
bfs;
高级用法:LLM 拓扑优化
use ;
// 1. 从 Safetensors 加载模型
let mut graph = load_from_safetensors?;
// 2. 检测拓扑缺陷
let mut editor = new;
let defects = editor.detect_defects?;
println!;
// 3. 张量环压缩
let compressor = default;
let report = compressor.compress_graph?;
println!;
// 4. 导出优化后的模型到 Safetensors
save_to_safetensors?;
ModelSwitch 双向转换
ModelSwitch 提供 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换:
use ModelSwitch;
// 加载:Safetensors → GodGraph
let graph = load_from_safetensors?;
// 验证拓扑
let topology_report = validate_topology?;
println!;
// 验证权重(比较两个图的权重差异)
let weight_diff = verify_weights?;
println!;
// 导出:GodGraph → Safetensors
save_to_safetensors?;
功能特性:
- ✅ 支持 F32/F64/F16 数据类型
- ✅ 权重精度验证(L2 范数比较)
- ✅ 拓扑完整性检查
- ✅ 往返精度损失 < 1e-5
详见 ModelSwitch 示例。
🔬 核心功能
1. ModelSwitch 双向转换 ⭐ 核心功能
ModelSwitch 实现 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换,是 LLM 白盒分析的工作流基础。
use ModelSwitch;
// 1. 加载:Safetensors → GodGraph
let graph = load_from_safetensors?;
// 2. 验证拓扑完整性
let topology_report = validate_topology?;
println!;
println!;
println!;
// 3. 验证权重精度(比较两个图的权重差异)
let weight_diff = verify_weights?;
println!;
println!;
// 4. 导出:GodGraph → Safetensors
save_to_safetensors?;
核心功能:
- 双向转换:Safetensors ↔ GodGraph 无损转换
- 数据类型支持:F32、F64、F16 自动转换
- 拓扑验证:检查连通性、环、孤立节点
- 权重验证:L2 范数比较,精度损失 < 1e-5
- 算子推断:根据权重名称自动推断算子类型(Attention、MLP、Norm 等)
运行示例:
详见 ModelSwitch 示例。
2. 可微图结构 (DifferentiableGraph) ⭐ 核心创新
这是 God-Graph 的原创性贡献——将图结构从"静态容器"变为"可微分的计算本身"。
use ;
// 1. 从标准 Transformer 构建可微图
let mut graph = build_transformer;
let config = default.with_sparsity;
let mut diff_graph = from_graph;
// 2. 梯度下降优化结构
for step in 0..100
// 3. 导出剪枝后的图
let pruned = diff_graph.discretize;
println!;
核心技术:
- 连续松弛:将离散的边存在性转换为连续概率(0 到 1)
- STE 估计器:Straight-Through Estimator 实现离散 - 连续双向转换
- Gumbel-Softmax:可微分采样,支持梯度反向传播
- 李群正交化:保证权重矩阵数值稳定性
应用场景:
- 动态注意力剪枝(减少 30-50% 冗余连接)
- 神经架构搜索(自动发现最优残差连接)
- 拓扑缺陷检测(孤立节点、梯度阻断)
3. 李群正交化 (Lie Group Orthogonalization)
用李群理论保证权重矩阵的正交性,提升数值稳定性。
use ;
// so(n) 李代数 → SO(n) 李群
let algebra = from_vec;
let rotation = lie_exponential?;
assert!;
数学原理:指数映射 exp: so(n) → SO(n) 用 Padé 近似 + 缩放 - 平方算法实现。
4. 张量环压缩 (Tensor Ring Compression)
把高维张量表示为 3D 核心张量的环,减少参数量。
use TensorRingCompressor;
let compressor = default;
let ring = compressor.decompose?;
println!;
压缩比公式:(m × n) / (r₀×m×r₁ + r₁×n×r₂)
5. 拓扑约束求解 (Topology Constraint Solving)
像 CAD 软件一样检查 LLM 的"几何完整性"。
use ;
let mut editor = new;
// 检测缺陷
let defects = editor.detect_defects?;
// 添加约束
editor.add_constraint?;
// 求解约束(自动修复)
editor.solve_constraints?;
缺陷类型:孤立节点、disconnected 组件、梯度阻断、缺失残差连接。
6. GraphTransformer 显式注意力分析
定位说明:GraphTransformer 主要用于可视化注意力拓扑、动态剪枝弱边、添加自定义连接。对于高性能推理,建议转换为标准 LlamaModel。
use GraphTransformer;
let mut transformer = new;
transformer.build_graph;
// 可视化注意力拓扑
let dot = transformer.to_dot;
write?;
// 剪枝弱注意力边(阈值=0.01)
let pruned = transformer.prune_weak_edges;
println!;
// 添加自定义长程连接
transformer.add_skip_connection;
核心优势:
- 每条注意力边可单独访问/修改(黑盒推理引擎做不到)
- 支持动态拓扑编辑(传统静态图做不到)
- 可导出为 DOT/Graphviz 可视化
📊 性能数据
并行算法加速比
| 算法 | 规模 | 串行时间 | 并行时间 | 加速比 |
|---|---|---|---|---|
| PageRank | 1,000 节点 | 53.9ms | 668µs | 80.7x |
| DFS | 50K 节点 | 9.7ms | 1.3ms | 7.5x |
| Connected Components | 2,000 节点 | - | 357.8µs | - |
详见 性能报告。
SIMD 优化
| 图规模 | 串行 | 并行 | SIMD | 提升 |
|---|---|---|---|---|
| 100 节点 | 2.1ms | 280µs | ~150µs | 14x |
| 1,000 节点 | 210ms | 2.8ms | ~1.5ms | 140x |
🏗️ 架构设计
CAD-LLM 范式映射
| CAD 概念 | LLM 等价物 | GodGraph 实现 |
|---|---|---|
| 表面断裂检查 | 孤立注意力头检测 | connected_components |
| 非流形几何检查 | 梯度阻断检测 | topological_sort + path_analysis |
| 尺寸约束 | 注意力头权重平衡 | AttentionHeadBalance 约束 |
| 平行约束 | 残差连接强制 | ResidualConnection 约束 |
| 装配约束 | 模块接口匹配 | validate_assembly |
| 零件替换 | 模块提取/替换 | extract_module / replace_module |
详见 设计哲学。
📦 特性标志
基础特性
| 特性 | 说明 |
|---|---|
parallel |
并行算法(Rayon) |
simd |
SIMD 向量化(wide::f64x4) |
tensor |
张量核心支持(ndarray) |
tensor-sparse |
稀疏张量格式(COO/CSR) |
tensor-gnn |
GNN 层(GCN/GAT/GraphSAGE) |
LLM 优化特性
| 特性 | 说明 |
|---|---|
transformer |
Transformer 基础架构 |
safetensors |
Safetensors 模型加载 |
cad-llm |
CAD-LLM 拓扑优化(实验性) |
元特性(推荐)
| 元特性 | 包含 |
|---|---|
tensor-full |
所有张量功能 |
tensor-inference |
GNN 推理专用 |
llm |
完整 LLM 支持 |
🔮 路线图
| 版本 | 状态 | 关键特性 |
|---|---|---|
| v0.4.3-beta | ✅ 已发布 | 李群正交化、张量环压缩、拓扑约束 |
| v0.5.0-alpha | 🔥 当前 | DifferentiableGraph 可微图结构、完整模型加载、真实模型验证 |
| v0.6.0-beta | 📅 计划 | 内存池基准测试、GraphTransformer 执行引擎 |
| v0.7.0-rc | 📅 计划 | 生产环境测试、crates.io 发布 |
v0.5.0-alpha 核心特性
- DifferentiableGraph(可微图结构):1421 行核心代码,支持梯度引导的架构搜索
- 真实模型验证:TinyLlama-1.1B 端到端优化流程
- 图级正交化修复:原地正交化接口(零拷贝),误差 < 1e-8
- 完整示例:5 个端到端 DifferentiableGraph 示例
🎓 目标用户
适合使用 God-Graph
✅ LLM 研究人员——想检查和修改模型拓扑结构 ✅ 模型压缩工程师——想用张量环/正交化压缩模型 ✅ QA 团队——想验证模型完整性和数值稳定性 ✅ 算法探索者——想实验动态剪枝、稀疏注意力、架构搜索 ✅ 白盒分析需求——想理解 LLM 内部工作机制
不适合使用 God-Graph
❌ 应用开发者——只想用 LLM 推理(用 llama.cpp)
❌ 训练工程师——想训练新模型(用 PyTorch/JAX)
❌ GPU 加速需求——需要 CUDA 推理(用 candle 或 vllm)
🌟 God-Graph 的独特优势
1. 桶式邻接表 + Generation 索引
- O(1) 增量更新:优于静态 CSR 格式,适合动态图编辑场景
- 防止 ABA 问题:删除节点后重用索引不会混淆(petgraph 没有的类型安全)
- 64 字节对齐:避免 CPU 缓存 false sharing,推理性能基础
2. DifferentiableGraph(原创创新)
- 可微图结构:将离散图结构转换为连续可微形式
- 梯度引导搜索:用梯度下降自动发现最优神经网络架构
- STE + Gumbel-Softmax:支持离散 - 连续双向转换,梯度反向传播
3. GraphTransformer 显式注意力
- 每条边可单独访问/修改:黑盒推理引擎(llama.cpp)做不到
- 动态拓扑编辑:传统静态图(petgraph)做不到
- 可视化支持:导出 DOT/Graphviz 格式,直观理解注意力模式
4. ModelSwitch 双向转换工作流
- Safetensors ↔ GodGraph:HuggingFace 格式双向转换
- 权重精度验证:L2 范数比较,往返损失 < 1e-5
- 拓扑完整性检查:自动检测孤立节点、梯度阻断
- 算子类型推断:根据权重名称识别 Attention、MLP、Norm 等
5. 李群正交化 + 张量环压缩
- 数学保证:李群理论保证权重矩阵正交性,数值稳定性
- 压缩比:张量环分解减少 2-10x 参数量
- 端到端工作流:Safetensors ↔ GodGraph ↔ Safetensors
🤝 贡献
欢迎贡献!请确保:
- 代码通过
cargo clippy和cargo fmt - 添加适当的测试
- 更新文档
📄 许可证
双许可证:MIT 或 Apache-2.0(任选其一)
🙏 致谢
- petgraph - Rust 图算法库先驱
- ndarray - N 维数组
- wide - SIMD 数学库
- HuggingFace - Safetensors 格式
联系方式: silverenternal 3147264070@qq.com 项目地址: https://github.com/silverenternal/god-graph
Quick Start
Installation
Add dependency to Cargo.toml:
[]
= "0.4.2-beta"
Basic Usage
use Graph;
use ;
// Create a directed graph
let mut graph = directed;
// Add nodes
let a = graph.add_node.unwrap;
let b = graph.add_node.unwrap;
let c = graph.add_node.unwrap;
// Add edges
graph.add_edge.unwrap;
graph.add_edge.unwrap;
graph.add_edge.unwrap;
// Query
println!;
println!;
// Iterate over neighbors
for neighbor in graph.neighbors
Using Graph Builder
use GraphBuilder;
let graph = directed
.with_nodes
.with_edges
.build
.unwrap;
Algorithms
Traversal Algorithms
use ;
// Depth-First Search
dfs;
// Breadth-First Search
bfs;
// Topological Sort (DAG)
let order = topological_sort;
// Tarjan's Strongly Connected Components
let sccs = tarjan_scc;
Shortest Path Algorithms
use ;
// Dijkstra's Algorithm (non-negative weights)
let = dijkstra.unwrap;
// A* Search
let heuristic = ;
let = astar.unwrap;
// Bellman-Ford (handles negative weights)
let distances = bellman_ford;
// Floyd-Warshall (all-pairs shortest paths)
let distances = floyd_warshall;
Minimum Spanning Tree
use ;
// Kruskal's Algorithm
let mst = kruskal;
// Prim's Algorithm
let mst = prim;
Centrality Algorithms
use ;
// Degree Centrality
let centrality = degree_centrality;
// Betweenness Centrality
let centrality = betweenness_centrality;
// Closeness Centrality
let centrality = closeness_centrality;
// PageRank
let ranks = pagerank;
Community Detection
use ;
// Connected Components
let components = connected_components;
// Label Propagation Algorithm
let communities = label_propagation;
Flow Algorithms
use ;
// Edmonds-Karp Maximum Flow
let = edmonds_karp;
// Dinic's Algorithm
let flow = dinic;
// Push-Relabel Algorithm
let flow = push_relabel;
Parallel Algorithms
Enable parallel feature to use parallel algorithms:
[]
= { = "0.4.2-beta", = ["parallel"] }
use parallel;
// Parallel BFS
let layers = bfs_parallel;
// Parallel PageRank
let ranks = pagerank_parallel;
// Parallel Connected Components
let components = connected_components_parallel;
SIMD Optimization
Enable simd feature for SIMD vectorization (supports stable Rust):
[]
= { = "0.4.2-beta", = ["simd"] }
use parallel;
// SIMD-accelerated PageRank
let ranks = par_pagerank_simd;
// SIMD-accelerated Degree Centrality
let centrality = par_degree_centrality_simd;
Implementation Details: Uses wide::f64x4 type for 4-way parallel floating-point operations, automatically leveraging CPU SIMD instruction sets (SSE/AVX/AVX-512).
Tensor & GNN Support
Enable tensor features for Graph Neural Network workflows:
[]
= { = "0.4.2-beta", = ["tensor", "tensor-gnn"] }
Basic Tensor Operations
use ;
// Create tensors
let a = new;
let b = new;
// Matrix multiplication
let c = a.matmul;
// Transpose
let t = a.transpose;
// Normalize
let norm = a.normalize;
Graph-Tensor Conversion
use Graph;
use GraphTensorExt;
// Create a graph with vector node features
let mut graph = directed;
let n0 = graph.add_node.unwrap;
let n1 = graph.add_node.unwrap;
let n2 = graph.add_node.unwrap;
let _ = graph.add_edge;
let _ = graph.add_edge;
let _ = graph.add_edge;
// Convert to tensor representation
let = graph.to_tensor_representation.unwrap;
assert_eq!;
assert_eq!;
GNN Layers
Important: God-Graph GNN modules are inference-only (forward pass only). For training workflows, integrate with external autograd libraries:
Inference Example (Recommended Use Case)
use ;
// Create GCN layer
let gcn = new;
// Create GAT layer (multi-head attention)
let gat = new;
// Create GraphSAGE layer
let graphsage = new;
// Forward pass (inference only)
let h1 = gcn.forward;
let h2 = gat.forward;
let output = graphsage.forward;
Training Integration Example (with dfdx)
For complete GNN training, integrate with dfdx:
// Pseudo-code: Integrate god-gragh GNN with dfdx autograd
use *;
use GCNConv;
// 1. Use god-gragh for graph structure and forward pass
let gcn = new;
let output = gcn.forward;
// 2. Convert to dfdx tensor for autograd
// let dfdx_tensor = Tensor1D::from(output.data());
// 3. Define loss and optimizer (dfdx)
// let loss = cross_entropy_loss(&dfdx_tensor, &labels);
// let mut optimizer = Adam::new(model.parameters(), lr=0.001);
// 4. Training loop
// for epoch in 0..num_epochs {
// optimizer.zero_grad();
// let loss = forward_pass(&graph, &labels);
// optimizer.backward(&loss);
// optimizer.step();
// }
See: examples/differentiable_graph.rs for an example of differentiable graph structures and gradient-based optimization.
Memory Pool Optimization
use ;
// Create a tensor pool
let config = new.with_preallocate;
let mut pool = new;
// Acquire tensor from pool (automatically zeroed)
let tensor = pool.acquire;
// Automatically returned to pool when dropped
drop;
Benefits:
- Memory Reuse: Reduces allocation overhead in iterative algorithms (PageRank, GNN training) by 80-90%
- Automatic Recycling:
PooledTensorautomatically returns to pool on Drop - Gradient Checkpointing:
GradientCheckpointreduces memory usage during backpropagation by 40-60%
Memory Pool Benchmark Results
Latest benchmarks (run on Linux, Rust 1.85, March 2026):
| Benchmark | Time | Pool Hit Rate | Allocation Reduction |
|---|---|---|---|
| Iterative (without pool) | 850.84 µs | N/A | Baseline |
| Iterative (with pool) | 127.76 µs | 98-100% | 98-99.9% |
| GNN Iteration | 31.93 µs | 96-99% | 96-99% |
| MatMul Temporaries | 42.15 µs | 95-98% | 95-98% |
| Small Tensors (16x16) | 6.89 µs | 98%+ | 98%+ |
| Large Tensors (512x512) | 17.36 µs | 95%+ | 95%+ |
| Sequential Alloc/Dealloc | 34.71 µs | 98%+ | 98%+ |
| Warm Pool (preallocated) | 34.39 µs | 100% | 100% |
| Cold Pool (no prealloc) | 35.32 µs | 98%+ | 98%+ |
Key Findings:
-
Allocation Reduction: The memory pool achieves 98-99.9% reduction in new system allocations for iterative workloads, validating the "80-90% reduction" claim with actual measurements showing even better results.
-
Performance Speedup: For iterative allocation patterns (50 iterations of 128x128 tensors), the pool achieves 6.7x speedup (850.84 µs → 127.76 µs).
-
Preallocation Benefit: Warm pools with preallocation achieve near-perfect hit rates (100%), eliminating allocation overhead entirely.
-
Typical Workloads:
- GNN Forward Pass: 96-99% allocation reduction (hidden state temporaries)
- Attention QKV Projections: 95-98% reduction (sequential matmul temporaries)
- Batch Processing: 98%+ reduction with preallocated pools
Pool Statistics from Benchmarks:
=== Iterative Pool Stats ===
Total allocations: 204800
Pool hits (reuses): 204799
Pool misses (new allocs): 1
Hit rate: 100.00%
Allocation reduction: 100.00%
=== GNN Iteration Pool Stats ===
Total allocations: 300
Pool hits (reuses): 297
Pool misses (new allocs): 3
Hit rate: 99.00%
Allocation reduction: 99.00%
=== MatMul Temporaries Pool Stats ===
Total allocations: 60
Pool hits (reuses): 57
Pool misses (new allocs): 3
Hit rate: 95.00%
Allocation reduction: 95.00%
Note: The memory pool provides maximum benefit in:
- Iterative algorithms (PageRank, GNN message passing) - 98-99.9% reduction
- Sequential temporaries (QKV projections) - 95-98% reduction
- Batch processing with preallocated pools - 100% hit rate achievable
Run memory pool benchmarks:
Transformer & LLM Inference
Enable the transformer feature for LLaMA/Mistral model inference:
[]
= { = "0.4.2-beta", = ["transformer"] }
Model Loading
use ;
use ;
// Load from HuggingFace Hub
let = load_from_hf_hub.unwrap;
// Build the model
let model = new.unwrap;
// Create text generator
let generator = new;
🔬 真实模型验证:TinyLlama-1.1B
God-Graph 已完整支持真实 LLM 模型的加载、验证和优化。本项目使用 TinyLlama-1.1B 进行端到端验证。
模型下载
# 使用 HuggingFace Hub 下载 TinyLlama-1.1B
# 模型将下载到 models/tinyllama/model.safetensors
加载真实模型
use ;
use ;
// 从 safetensors 加载 TinyLlama
let graph = load_from_safetensors?;
// 验证模型结构
let node_count = graph.node_count;
let edge_count = graph.edge_count;
println!;
println!;
println!;
// 验证所有权重有效(无 NaN/Inf)
for edge_ref in graph.edges
println!;
李群正交化验证
对真实模型权重进行正交化处理,验证数值稳定性:
use ;
// 配置正交化参数
let config = default
.with_cayley // 使用 Cayley 变换
.with_block_size; // SO(32) 块大小
// 执行原地正交化(零拷贝)
let errors = orthogonalize_weights_in_place?;
// 验证正交化效果
let avg_error = errors.iter. / errors.len as f64;
println!;
println!;
println!;
// 验证:正交化误差应 < 1e-8
assert!;
println!;
张量环压缩验证
对真实模型进行张量环压缩,验证压缩效果:
use ;
// 配置压缩参数
let config = default
.with_target_rank // 目标秩
.with_min_rank; // 最小秩
// 执行压缩
let compressor = new;
let report = compressor.compress_graph?;
println!;
println!;
println!;
println!;
println!;
// 验证:压缩比应 < 0.5(至少 2 倍压缩)
assert!;
println!;
测试命令
运行完整验证测试:
# 运行所有真实模型验证测试
# 运行正交化测试
# 运行压缩测试
验证结果
测试文件: tests/real_model_validation.rs
| 测试项 | 状态 | 说明 |
|---|---|---|
test_load_tinyllama_model |
✅ 通过 | 模型加载验证 |
test_tinyllama_orthogonalization |
✅ 通过 | 正交化误差 < 1e-8 |
test_tinyllama_tensor_ring |
✅ 通过 | 压缩比 < 0.5 |
test_tinyllama_weight_validity |
✅ 通过 | 无 NaN/Inf |
关键指标:
- 正交化误差:2.04e-14 (远低于 1e-8 阈值)
- 压缩比:0.12x - 0.25x (取决于秩选择)
- 重建误差:< 1e-6 (数值精度保证)
详见 tests/real_model_validation.rs 和 CAD-LLM 1B 验证报告。
Model Loading (English)
use ;
use ;
// Load from HuggingFace Hub
let = load_from_hf_hub.unwrap;
// Build the model
let model = new.unwrap;
// Create text generator
let generator = new;
Text Generation
use ;
// Configure generation
let config = new
.with_max_length
.with_temperature
.with_top_p
.with_top_k
.with_repetition_penalty
.with_sampling_mode;
// Generate text
let prompt = "Once upon a time";
let result = generator.generate.unwrap;
println!;
println!;
KV Cache Optimization
use ;
// Configure KV cache
let config = new
.with_max_batch_size
.with_max_seq_len
.with_dtype;
// Cache is managed internally during generation
// Supports incremental decoding and multi-turn dialogue
Batch Inference with SIMD
Enable simd feature for SIMD-accelerated batch inference:
[]
= { = "0.4.2-beta", = ["transformer", "simd"] }
use ;
// Create batch generator
let mut batch_gen = new;
// Add multiple requests
batch_gen.add_request;
batch_gen.add_request;
batch_gen.add_request;
// Process batch with SIMD acceleration
let results = batch_gen.generate_batch.unwrap;
Supported Models
- LLaMA / Llama-2 / Llama-3: Meta's open language models
- Mistral: Mistral AI's efficient models
- Gemma: Google's lightweight models
- Qwen: Alibaba's multilingual models
Note: Model weights must be in .safetensors format. Use load_safetensors or load_from_hf_hub for automatic conversion.
Examples
See the following examples for complete workflows:
examples/llm_model_loader.rs- Loading models from HuggingFaceexamples/llm_text_gen.rs- End-to-end text generationexamples/llm_batch_simd.rs- SIMD-accelerated batch inference
Run examples:
# Text generation demo
# Batch inference with SIMD
Documentation
- Transformer Module Guide - API reference and usage
- Transformer Tutorial - Step-by-step tutorial
- Enhancements Report - Implementation details
Random Graph Generation
use ;
// Erdős-Rényi Random Graph G(n, p)
let graph = ;
// Barabási-Albert Preferential Attachment Model
let graph = ;
// Watts-Strogatz Small-World Network
let graph = ;
// Complete Graph
let graph = ;
// Grid Graph
let graph = ;
// Tree
let graph = ;
Graph Export
DOT/Graphviz Format
use ;
// Export to DOT format (Graphviz)
let dot = to_dot;
write?;
// Generate visualization:
// bash: dot -Tpng graph.dot -o graph.png
SVG Visualization
use ;
// Export to SVG format with custom options
let options = new
.with_size
.with_node_radius
.with_layout;
let svg = to_svg;
write?;
// View in browser using examples/graph_viewer.html
Layout Algorithms:
- Force-Directed: Physics-based layout with node repulsion and edge attraction
- Circular: Nodes arranged in a circle
- Hierarchical: Layered layout based on topological sort
Interactive Viewer: Open examples/graph_viewer.html in browser to:
- Drag and drop SVG files
- Zoom and pan
- Adjust node/edge styles in real-time
- View node list
Adjacency List & Edge List
// Export as adjacency list
let adj_list = to_adjacency_list;
// Export as edge list
let edge_list = to_edge_list;
Feature Flags
Basic Features
| Feature | Description | Dependencies |
|---|---|---|
std |
Standard library support (enabled by default) | - |
parallel |
Parallel algorithms | rayon, crossbeam-queue |
serde |
Serialization support | serde |
dot |
DOT format export | - |
simd |
SIMD vectorization (experimental, stable Rust) | wide |
matrix |
Matrix representation | nalgebra |
rand |
Random graph generation | rand, rand_chacha |
unstable |
Nightly Rust features | - |
Tensor Features
| Feature | Description | Dependencies |
|---|---|---|
tensor |
Tensor core support (ndarray backend) | ndarray |
tensor-sparse |
Sparse tensor formats (COO, CSR, BSR) | tensor |
tensor-gpu |
GPU acceleration (requires CUDA) | tensor, dfdx |
tensor-candle |
Candle backend (Hugging Face) | tensor, candle-core |
tensor-autograd |
Automatic differentiation | tensor, dfdx |
tensor-serde |
Tensor serialization | tensor, serde |
tensor-gnn |
GNN layers (GCN, GAT, GraphSAGE) | tensor, tensor-sparse, rand_distr |
tensor-pool |
Memory pool optimization | tensor, bitvec |
tensor-batch |
Batch graph processing | tensor, tensor-sparse |
Meta-Features (Recommended)
| Meta-Feature | Description | Included Features |
|---|---|---|
tensor-full |
All tensor features | tensor, tensor-sparse, tensor-gnn, tensor-pool, tensor-batch |
tensor-inference |
GNN inference only | tensor, tensor-sparse, tensor-gnn |
tensor-ml |
ML training support | tensor, tensor-sparse, tensor-gnn, tensor-autograd, tensor-pool |
Transformer Features
| Feature | Description | Dependencies |
|---|---|---|
transformer |
Transformer/LLM inference | serde_json, memmap2, regex |
safetensors |
Safetensors model loading | safetensors |
hf-hub |
HuggingFace Hub integration | hf-hub, tokio |
simd |
SIMD acceleration for batch inference | wide |
Note: For complete Transformer support, use --features "transformer,safetensors,simd". See Transformer & LLM Inference for details.
Comparison with petgraph
| Feature | God-Graph | petgraph |
|---|---|---|
| Memory Layout | Bucket-based adjacency list + Arena-style slots | Adjacency list |
| Incremental Updates | ✅ O(1) | ❌ Requires rebuild |
| Stable Indices | ✅ Generation counting | ✅ Stable Graph |
| Parallel Algorithms | ✅ Built-in (5+) | ❌ |
| Cache Optimization | ✅ 64-byte alignment | ❌ |
| SIMD Vectorization | ✅ wide::f64x4 | ❌ |
| Tensor/GNN Support | ✅ Multi-backend | ❌ |
| Transformer/LLM | ✅ LLaMA/Mistral inference | ❌ |
| API Design | Generic traits | Concrete types |
| Documentation | 🌱 Growing | 🌳 Mature |
| Community Maturity | 🌱 Growing | 🌳 Mature |
God-Graph Advantages:
- Generation-indexed stability prevents ABA problems
- Bucket-based adjacency list supports O(1) incremental updates
- Built-in parallel algorithm suite with proven speedups
- Cache-optimized memory layout (64-byte alignment, software prefetching)
- SIMD vectorization for batch computations
- Integrated tensor/GNN support for machine learning workflows
petgraph Advantages:
- Mature community, production-proven
- Comprehensive documentation
- More algorithm variants
Performance Benchmarks
Detailed performance data available in Performance Report.
Benchmark results on 8-core CPU:
| Algorithm | Scale | Serial Time | Parallel Time | Speedup |
|---|---|---|---|---|
| PageRank | 1,000 nodes | 53.9ms | 668µs | 80.7x |
| DFS | 50K nodes | 9.7ms | 1.3ms | 7.5x |
| Connected Components | 2,000 nodes | - | 357.8µs | - |
| Degree Centrality | 5,000 nodes | - | 146µs | - |
SIMD Performance (Estimated)
| Graph Scale | Serial | Parallel | SIMD | Speedup |
|---|---|---|---|---|
| 100 nodes | 2.1ms | 280µs | ~150µs | 14x |
| 1,000 nodes | 210ms | 2.8ms | ~1.5ms | 140x |
| 5,000 nodes | 5.2s | 68ms | ~40ms | 130x |
Note: SIMD performance depends on CPU instruction set support (AVX2/AVX-512)
Memory Pool Performance
The tensor memory pool reduces allocation overhead by reusing pre-allocated memory, achieving high reuse ratios for iterative algorithms.
Benchmark Results (Actual Measurements)
| Benchmark | Without Pool | With Pool | Improvement |
|---|---|---|---|
| Iterative allocation (50× 128×128) | 847.91 µs | 2.57 µs/iter | ~99.7% faster per iteration |
| GNN iteration (10 steps) | N/A | 10.85 µs | Hit rate: 99.89% |
| Matrix multiplication temporaries | N/A | 4.14 µs | Hit rate: 99.93% |
| Small tensor allocation (16×16) | N/A | 694.86 ns | Hit rate: 99.95% |
| Large tensor allocation (512×512) | N/A | 48.12 µs | Hit rate: 99.93% |
| Sequential alloc/dealloc (50×) | N/A | 35.22 µs | Hit rate: 100.00% |
Pool Hit Rate by Workload
| Workload | Warm-up Hit Rate | Steady-state Hit Rate |
|---|---|---|
| Iterative (single tensor) | 0% → 99.90% (16 steps) | 99.99% |
| GNN iteration | 90% → 99.99% (10 steps) | 99.89% |
| Matrix multiplication | 98.33% → 100% | 99.93% |
| Batch size 10 | 90% → 100% | 99.99% |
| Batch size 25 | 96% → 100% | 99.99% |
| Batch size 50 | 98% → 100% | 99.99% |
| Batch size 100 | 99% → 100% | 99.99% |
Pre-allocation Impact (Warm vs Cold Pool)
| Configuration | Initial Hit Rate | Steady-state Hit Rate | Latency |
|---|---|---|---|
| Cold pool (no pre-alloc) | 0% | 100% | 696.89 ns |
| Warm pool (pre-allocated) | 0% → 100% (faster) | 100% | 696.83 ns |
Key Metrics:
- Reuse Ratio: >99% for iterative workloads (measured via
pool.hit_rate()) - Allocation Reduction: 80-90% fewer system allocations after warm-up
- Memory Throughput: Pre-allocation eliminates runtime allocation latency
- Warm-up Time: ~16 iterations to reach 99.9%+ hit rate
- Steady-state Latency: ~695 ns per tensor acquire/release
Note: The memory pool shows higher absolute time in micro-benchmarks due to pool management overhead, but provides significant benefits in real-world iterative algorithms by eliminating repeated system allocations and improving cache locality.
Usage Example:
use ;
// Create pool with pre-allocation
let config = new.with_preallocate;
let mut pool = new;
// Acquire tensors (reuses memory after first allocation)
for _ in 0..50
// Check statistics
let stats = pool.stats;
println!;
println!;
Run benchmarks:
Test Coverage
This project uses cargo-tarpaulin for coverage measurement, targeting 80%+ coverage.
Generate Coverage Report
# Install cargo-tarpaulin
# Generate HTML coverage report
# View report
Current Coverage
- Overall Coverage: 66.64% (1560/2341 lines)
- Unit Tests: 82 passed
- Integration Tests: 18 passed
- Property Tests: 15 passed
- Doc Tests: 27 passed (1 ignored)
- Total: 142 tests, 100% passing
See coverage/tarpaulin-report.html for details.
Development Roadmap
See ROADMAP.json for detailed roadmap.
Version History
- v0.1.0-alpha: Core graph structure, basic CRUD, DFS/BFS
- v0.2.0-alpha: Complete algorithm suite, random graph generators
- v0.3.0-beta: Performance reports, migration guide, parallel algorithms
- v0.4.0-beta: Tensor/GNN integration, memory pool optimization, differentiable graph
- v0.4.2-beta: Transformer/LLM inference, LLaMA/Mistral support, KV Cache optimization, text generation
- v0.5.0-rc: Serde support, API stabilization
- v1.0.0-stable: Production-ready
Upcoming Features
- Improve test coverage to 80%+
- GitHub Pages documentation site
- crates.io release
- Graph-Tensor deep integration (Phase 4)
- Automatic differentiation support (Phase 5)
- GPU acceleration with Dfdx/Candle backends (Phase 6)
- Multi-modal models (Llava, etc.)
Contributing
Contributions are welcome! Please follow these steps:
- Fork the project
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure:
- Code passes
cargo clippyandcargo fmt - Add appropriate tests
- Update documentation
Known Issues
-
Coverage Gap: Current 66.64%, below 80% target
- Main gaps: Community detection, flow algorithms, matching algorithms
- Plan: Add targeted tests in v0.4.0
-
Force-Directed Layout: Current implementation is simplified
- 50 iterations, fixed parameters
- Plan: Configurable iterations and physics parameters in v0.4.0
-
par_dijkstra: Marked as experimental in v0.3.0-beta
- Known issues with bucket index calculation and potential deadlocks
- Plan: Refactor in v0.4.0
License
This project is dual-licensed: MIT or Apache-2.0 (at your option).
See LICENSE-MIT and LICENSE-APACHE for details.
Acknowledgments
- petgraph - Pioneer of Rust graph libraries
- rayon - Data parallelism library
- Graphviz - Graph visualization tool
- wide - SIMD math library for stable Rust
- ndarray - N-dimensional arrays
- dfdx - Deep learning framework with CUDA support
- Candle - HuggingFace's lightweight tensor library
- Hugging Face - Open-source AI community and model hub
- Safetensors - Safe tensor serialization format
- LLaMA - Meta's open language models
- Mistral - Mistral AI's efficient language models
Contact
- Issue Reports: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs.rs/god-gragh