# God-Graph
[](https://crates.io/crates/god-gragh)
[](https://docs.rs/god-gragh)
[](https://github.com/silverenternal/god-graph?tab=License-1-ov-file#readme)
[](https://github.com/silverenternal/god-graph/actions)
[](https://codecov.io/gh/silverenternal/god-graph)
> **God-Graph 是一个 LLM 白盒分析工具——把 LLM 从黑盒变成可编辑的白盒**
>
> 核心创新:**DifferentiableGraph(可微图结构)**——用梯度下降优化神经网络架构,支持动态注意力剪枝、拓扑缺陷检测、自动架构搜索。
---
## 🎯 核心定位
**God-Graph 不是**:
- ❌ LLM 推理引擎(打不过 `llama.cpp`)
- ❌ GNN 训练框架(打不过 DGL/PyG)
- ❌ 通用图算法库(`petgraph` 更成熟)
**God-Graph 是**:
- ✅ **LLM 白盒分析工具**——可以检查/修改模型拓扑结构
- ✅ **可微图结构引擎**——用梯度下降优化神经网络架构(DifferentiableGraph)
- ✅ **拓扑缺陷检测器**——发现梯度阻断、孤立节点、缺失残差连接
- ✅ **数学层面优化器**——李群正交化、张量环压缩
**一句话总结**:God-Graph 用 CAD 软件的设计哲学优化 LLM——检查"表面断裂"(孤立节点)、"非流形几何"(梯度阻断)、"尺寸约束"(注意力头平衡),并首创**可微图结构**实现梯度引导的架构搜索。
---
## 📚 核心文档
**完整文档导航**: [docs/README.md](docs/README.md)
### 快速链接
| [**快速开始**](docs/user-guide/getting-started.md) | 5 分钟上手 God-Graph |
| [**DifferentiableGraph 教程**](docs/user-guide/differentiable-graph.md) | 可微图结构完整教程 |
| [**设计哲学**](docs/internals/cad-design.md) | 为什么需要 CAD-LLM 范式迁移 |
| [**架构指南**](docs/internals/architecture.md) | 模块职责和工作流 |
| [**性能报告**](docs/reports/performance.md) | 并行算法和 SIMD 性能数据 |
| [**实现状态**](docs/reports/implementation-status.md) | 功能完成度和路线图 |
| [**TinyLlama 验证**](docs/reports/validation.md) | 真实模型端到端验证 |
---
## ⚡ DifferentiableGraph 快速开始
**DifferentiableGraph 是 God-Graph 的核心创新**——它将图结构从"静态容器"变为"可微分的计算本身",支持用梯度下降优化神经网络架构。
### 核心应用场景
1. **动态注意力剪枝**:梯度引导剪除弱注意力边,减少 30-50% 冗余连接
2. **拓扑缺陷检测**:自动发现孤立节点、梯度阻断、缺失残差连接
3. **神经架构搜索**:让模型自己学习最优残差连接和注意力模式
4. **权重编辑**:李群正交化保证数值稳定性,支持精确的权重修改
### 5 分钟上手示例
```rust
use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig, ThresholdEditPolicy};
// 1. 从标准 Transformer 构建可微图
let mut graph = build_mini_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);
// 2. 定义目标函数(注意力熵 + 稀疏性正则)
let loss_fn = |g: &DifferentiableGraph| {
g.entropy_loss() + 0.1 * g.sparsity_loss()
};
// 3. 梯度下降优化结构
for step in 0..100 {
let loss = loss_fn(&diff_graph);
let grads = diff_graph.compute_structure_gradients(loss);
diff_graph.update_structure(&grads, 0.01);
if step % 10 == 0 {
println!("Step {}: loss={:.4}", step, loss);
}
}
// 4. 导出剪枝后的图
let policy = ThresholdEditPolicy::new(0.5);
let pruned_graph = diff_graph.discretize(&policy);
println!("剪枝了 {} 条弱注意力边", pruned_graph.num_pruned_edges());
```
### 完整示例
| [可微注意力剪枝](examples/differentiable_graph.rs) | 梯度引导剪除弱边 | `cargo run --example differentiable_graph --features tensor` |
| [拓扑缺陷检测](examples/cad_llm_validate_1b.rs) | 检测模型拓扑问题 | `cargo run --example cad_llm_validate_1b --features transformer` |
| [李群正交化](examples/cad_llm_orthogonalize.rs) | 权重正交化稳定性 | `cargo run --example cad_llm_orthogonalize --features transformer` |
| [张量环压缩](examples/cad_llm_tensor_ring.rs) | 模型压缩 workflow | `cargo run --example cad_llm_tensor_ring --features transformer` |
详见 [DifferentiableGraph 完整教程](docs/user-guide/differentiable-graph.md)。
## 🚀 快速开始
### 安装
```toml
[dependencies]
god-gragh = "0.5.0-alpha"
```
### 基础用法:图数据结构和算法
```rust
use god_gragh::graph::Graph;
use god_gragh::algorithms::traversal::{bfs, dfs};
// 创建图
let mut graph = Graph::<String, f64>::directed();
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let _ = graph.add_edge(a, b, 1.0);
// BFS 遍历
**ModelSwitch** 提供 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换:
```rust
use god_gragh::transformer::optimization::ModelSwitch;
// 加载:Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;
// 验证拓扑
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效:{}", topology_report.is_valid);
// 验证权重(比较两个图的权重差异)
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异:{:.6e}", weight_diff.max_l2_diff);
// 导出:GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;
```
**功能特性**:
- ✅ 支持 F32/F64/F16 数据类型
- ✅ 权重精度验证(L2 范数比较)
- ✅ 拓扑完整性检查
- ✅ 往返精度损失 < 1e-5
详见 [ModelSwitch 示例](examples/cad_llm_switch.rs)。
---
## 🔬 核心功能
### 1. ModelSwitch 双向转换 ⭐ 核心功能
**ModelSwitch** 实现 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换,是 LLM 白盒分析的工作流基础。
```rust
use god_gragh::transformer::optimization::ModelSwitch;
// 1. 加载:Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;
// 2. 验证拓扑完整性
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效:{}", topology_report.is_valid);
println!("连通分量:{}", topology_report.connected_components);
println!("是 DAG: {}", topology_report.is_dag);
// 3. 验证权重精度(比较两个图的权重差异)
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异:{:.6e}", weight_diff.max_l2_diff);
println!("平均 L2 差异:{:.6e}", weight_diff.avg_l2_diff);
// 4. 导出:GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;
```
**核心功能**:
- **双向转换**:Safetensors ↔ GodGraph 无损转换
- **数据类型支持**:F32、F64、F16 自动转换
- **拓扑验证**:检查连通性、环、孤立节点
- **权重验证**:L2 范数比较,精度损失 < 1e-5
- **算子推断**:根据权重名称自动推断算子类型(Attention、MLP、Norm 等)
**运行示例**:
```bash
cargo run --example cad_llm_switch --features safetensors
```
详见 [ModelSwitch 示例](examples/cad_llm_switch.rs)。
---
### 2. 可微图结构 (DifferentiableGraph) ⭐ 核心创新
**这是 God-Graph 的原创性贡献**——将图结构从"静态容器"变为"可微分的计算本身"。
```rust
use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig};
// 1. 从标准 Transformer 构建可微图
let mut graph = build_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);
// 2. 梯度下降优化结构
for step in 0..100 {
let loss = diff_graph.entropy_loss() + 0.1 * diff_graph.sparsity_loss();
let grads = diff_graph.compute_structure_gradients(loss);
diff_graph.update_structure(&grads, 0.01);
}
// 3. 导出剪枝后的图
let pruned = diff_graph.discretize(&ThresholdEditPolicy::new(0.5));
println!("剪枝比例:{:.2}%", pruned.pruned_ratio() * 100.0);
```
**核心技术**:
- **连续松弛**:将离散的边存在性转换为连续概率(0 到 1)
- **STE 估计器**:Straight-Through Estimator 实现离散 - 连续双向转换
- **Gumbel-Softmax**:可微分采样,支持梯度反向传播
- **李群正交化**:保证权重矩阵数值稳定性
**应用场景**:
- 动态注意力剪枝(减少 30-50% 冗余连接)
- 神经架构搜索(自动发现最优残差连接)
- 拓扑缺陷检测(孤立节点、梯度阻断)
详见 [DifferentiableGraph 教程](docs/differentiable_graph.md)。
---
### 3. 李群正交化 (Lie Group Orthogonalization)
用李群理论保证权重矩阵的正交性,提升数值稳定性。
```rust
use god_gragh::tensor::decomposition::{lie_exponential, is_orthogonal};
// so(n) 李代数 → SO(n) 李群
let algebra = DenseTensor::from_vec(
vec![0.0, -0.1, 0.1, 0.0],
vec![2, 2],
);
let rotation = lie_exponential(&algebra)?;
assert!(is_orthogonal(&rotation, 1e-5));
```
**数学原理**:指数映射 `exp: so(n) → SO(n)` 用 Padé 近似 + 缩放 - 平方算法实现。
---
### 4. 张量环压缩 (Tensor Ring Compression)
把高维张量表示为 3D 核心张量的环,减少参数量。
```rust
use god_gragh::transformer::optimization::TensorRingCompressor;
let compressor = TensorRingCompressor::default();
let ring = compressor.decompose(&weight_tensor)?;
println!("压缩比:{:.2}x", ring.compression_ratio());
```
**压缩比公式**:`(m × n) / (r₀×m×r₁ + r₁×n×r₂)`
---
### 5. 拓扑约束求解 (Topology Constraint Solving)
像 CAD 软件一样检查 LLM 的"几何完整性"。
```rust
use god_gragh::transformer::optimization::{CadStyleEditor, TopologyConstraint};
let mut editor = CadStyleEditor::new(&mut graph);
// 检测缺陷
let defects = editor.detect_defects()?;
// 添加约束
editor.add_constraint(TopologyConstraint::ResidualConnection {
from_layer: "attention".to_string(),
to_layer: "output".to_string(),
})?;
// 求解约束(自动修复)
editor.solve_constraints()?;
```
**缺陷类型**:孤立节点、disconnected 组件、梯度阻断、缺失残差连接。
---
### 6. GraphTransformer 显式注意力分析
**定位说明**:GraphTransformer 主要用于**可视化注意力拓扑**、**动态剪枝弱边**、**添加自定义连接**。对于高性能推理,建议转换为标准 LlamaModel。
```rust
use god_gragh::transformer::graph_transformer::GraphTransformer;
let mut transformer = GraphTransformer::new(12, 12, 768);
transformer.build_graph(&input_ids);
// 可视化注意力拓扑
let dot = transformer.to_dot();
std::fs::write("attention_graph.dot", dot)?;
// 剪枝弱注意力边(阈值=0.01)
let pruned = transformer.prune_weak_edges(0.01);
println!("剪枝 {} 条边", pruned);
// 添加自定义长程连接
transformer.add_skip_connection(layer_0, layer_11);
```
**核心优势**:
- 每条注意力边可单独访问/修改(黑盒推理引擎做不到)
- 支持动态拓扑编辑(传统静态图做不到)
- 可导出为 DOT/Graphviz 可视化
---
## 📊 性能数据
### 并行算法加速比
| PageRank | 1,000 节点 | 53.9ms | 668µs | **80.7x** |
| DFS | 50K 节点 | 9.7ms | 1.3ms | **7.5x** |
| Connected Components | 2,000 节点 | - | 357.8µs | - |
详见 [性能报告](docs/performance.md)。
### SIMD 优化
| 100 节点 | 2.1ms | 280µs | ~150µs | 14x |
| 1,000 节点 | 210ms | 2.8ms | ~1.5ms | 140x |
---
## 🏗️ 架构设计
### CAD-LLM 范式映射
| 表面断裂检查 | 孤立注意力头检测 | `connected_components` |
| 非流形几何检查 | 梯度阻断检测 | `topological_sort + path_analysis` |
| 尺寸约束 | 注意力头权重平衡 | `AttentionHeadBalance` 约束 |
| 平行约束 | 残差连接强制 | `ResidualConnection` 约束 |
| 装配约束 | 模块接口匹配 | `validate_assembly` |
| 零件替换 | 模块提取/替换 | `extract_module` / `replace_module` |
详见 [设计哲学](docs/CAD_LLM_DESIGN_PHILOSOPHY.md)。
---
## 📦 特性标志
### 基础特性
| `parallel` | 并行算法(Rayon) |
| `simd` | SIMD 向量化(wide::f64x4) |
| `tensor` | 张量核心支持(ndarray) |
| `tensor-sparse` | 稀疏张量格式(COO/CSR) |
| `tensor-gnn` | GNN 层(GCN/GAT/GraphSAGE) |
### LLM 优化特性
| `transformer` | Transformer 基础架构 |
| `safetensors` | Safetensors 模型加载 |
| `cad-llm` | CAD-LLM 拓扑优化(实验性) |
### 元特性(推荐)
| `tensor-full` | 所有张量功能 |
| `tensor-inference` | GNN 推理专用 |
| `llm` | 完整 LLM 支持 |
---
## 🔮 路线图
| v0.4.3-beta | ✅ 已发布 | 李群正交化、张量环压缩、拓扑约束 |
| **v0.5.0-alpha** | 🔥 **当前** | **DifferentiableGraph 可微图结构**、完整模型加载、真实模型验证 |
| v0.6.0-beta | 📅 计划 | 内存池基准测试、GraphTransformer 执行引擎 |
| v0.7.0-rc | 📅 计划 | 生产环境测试、crates.io 发布 |
### v0.5.0-alpha 核心特性
- **DifferentiableGraph(可微图结构)**:1421 行核心代码,支持梯度引导的架构搜索
- **真实模型验证**:TinyLlama-1.1B 端到端优化流程
- **图级正交化修复**:原地正交化接口(零拷贝),误差 < 1e-8
- **完整示例**:5 个端到端 DifferentiableGraph 示例
详见 [实现状态](LLM_PLAN_STATUS.md) 和 [todo.json](todo.json)。
---
## 🎓 目标用户
### 适合使用 God-Graph
✅ **LLM 研究人员**——想检查和修改模型拓扑结构
✅ **模型压缩工程师**——想用张量环/正交化压缩模型
✅ **QA 团队**——想验证模型完整性和数值稳定性
✅ **算法探索者**——想实验动态剪枝、稀疏注意力、架构搜索
✅ **白盒分析需求**——想理解 LLM 内部工作机制
### 不适合使用 God-Graph
❌ **应用开发者**——只想用 LLM 推理(用 `llama.cpp`)
❌ **训练工程师**——想训练新模型(用 PyTorch/JAX)
❌ **GPU 加速需求**——需要 CUDA 推理(用 `candle` 或 `vllm`)
---
## 🌟 God-Graph 的独特优势
### 1. 桶式邻接表 + Generation 索引
- **O(1) 增量更新**:优于静态 CSR 格式,适合动态图编辑场景
- **防止 ABA 问题**:删除节点后重用索引不会混淆(petgraph 没有的类型安全)
- **64 字节对齐**:避免 CPU 缓存 false sharing,推理性能基础
### 2. DifferentiableGraph(原创创新)
- **可微图结构**:将离散图结构转换为连续可微形式
- **梯度引导搜索**:用梯度下降自动发现最优神经网络架构
- **STE + Gumbel-Softmax**:支持离散 - 连续双向转换,梯度反向传播
### 3. GraphTransformer 显式注意力
- **每条边可单独访问/修改**:黑盒推理引擎(llama.cpp)做不到
- **动态拓扑编辑**:传统静态图(petgraph)做不到
- **可视化支持**:导出 DOT/Graphviz 格式,直观理解注意力模式
### 4. ModelSwitch 双向转换工作流
- **Safetensors ↔ GodGraph**:HuggingFace 格式双向转换
- **权重精度验证**:L2 范数比较,往返损失 < 1e-5
- **拓扑完整性检查**:自动检测孤立节点、梯度阻断
- **算子类型推断**:根据权重名称识别 Attention、MLP、Norm 等
### 5. 李群正交化 + 张量环压缩
- **数学保证**:李群理论保证权重矩阵正交性,数值稳定性
- **压缩比**:张量环分解减少 2-10x 参数量
- **端到端工作流**:Safetensors ↔ GodGraph ↔ Safetensors
---
## 🤝 贡献
欢迎贡献!请确保:
- 代码通过 `cargo clippy` 和 `cargo fmt`
- 添加适当的测试
- 更新文档
---
## 📄 许可证
双许可证:MIT 或 Apache-2.0(任选其一)
---
## 🙏 致谢
- [petgraph](https://github.com/petgraph/petgraph) - Rust 图算法库先驱
- [ndarray](https://crates.io/crates/ndarray) - N 维数组
- [wide](https://crates.io/crates/wide) - SIMD 数学库
- [HuggingFace](https://huggingface.co/) - Safetensors 格式
---
**联系方式**: silverenternal <3147264070@qq.com>
**项目地址**: https://github.com/silverenternal/god-graph
## Quick Start
### Installation
Add dependency to `Cargo.toml`:
```toml
[dependencies]
god-gragh = "0.4.2-beta"
```
### Basic Usage
```rust
use god_gragh::graph::Graph;
use god_gragh::graph::traits::{GraphOps, GraphQuery};
// Create a directed graph
let mut graph = Graph::<String, f64>::directed();
// Add nodes
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let c = graph.add_node("C".to_string()).unwrap();
// Add edges
graph.add_edge(a, b, 1.0).unwrap();
graph.add_edge(b, c, 2.0).unwrap();
graph.add_edge(a, c, 3.0).unwrap();
// Query
println!("Nodes: {}", graph.node_count());
println!("Edges: {}", graph.edge_count());
// Iterate over neighbors
for neighbor in graph.neighbors(a) {
println!("Neighbor: {}", graph[neighbor]);
}
```
### Using Graph Builder
```rust
use god_gragh::graph::builders::GraphBuilder;
let graph = GraphBuilder::directed()
.with_nodes(vec!["A", "B", "C", "D"])
.with_edges(vec![
(0, 1, 1.0),
(0, 2, 2.0),
(1, 3, 3.0),
(2, 3, 4.0),
])
.build()
.unwrap();
```
## Algorithms
### Traversal Algorithms
```rust
use god_gragh::algorithms::traversal::{dfs, bfs, topological_sort, tarjan_scc};
// Depth-First Search
dfs(&graph, start_node, |node| {
println!("Visit: {}", node.data());
true // Continue traversal
});
// Breadth-First Search
bfs(&graph, start_node, |node| {
println!("Visit: {}", node.data());
true
});
// Topological Sort (DAG)
let order = topological_sort(&graph);
// Tarjan's Strongly Connected Components
let sccs = tarjan_scc(&graph);
```
### Shortest Path Algorithms
```rust
use god_gragh::algorithms::shortest_path::{dijkstra, bellman_ford, floyd_warshall, astar};
// Dijkstra's Algorithm (non-negative weights)
let (path, distance) = dijkstra(&graph, start, Some(end)).unwrap();
// A* Search
let distances = bellman_ford(&graph, start);
// Floyd-Warshall (all-pairs shortest paths)
let distances = floyd_warshall(&graph);
```
### Minimum Spanning Tree
```rust
use god_gragh::algorithms::mst::{kruskal, prim};
// Kruskal's Algorithm
let mst = kruskal(&graph);
// Prim's Algorithm
let mst = prim(&graph, start_node);
```
### Centrality Algorithms
```rust
use god_gragh::algorithms::centrality::{
degree_centrality, betweenness_centrality, closeness_centrality, pagerank
};
// Degree Centrality
let centrality = degree_centrality(&graph);
// Betweenness Centrality
let centrality = betweenness_centrality(&graph);
// Closeness Centrality
let centrality = closeness_centrality(&graph);
// PageRank
let ranks = pagerank(&graph, 0.85, 20);
```
### Community Detection
```rust
use god_gragh::algorithms::community::{connected_components, label_propagation};
// Connected Components
let components = connected_components(&graph);
// Label Propagation Algorithm
let communities = label_propagation(&graph);
```
### Flow Algorithms
```rust
use god_gragh::algorithms::flow::{edmonds_karp, dinic, push_relabel};
// Edmonds-Karp Maximum Flow
let (flow, residual_graph) = edmonds_karp(&graph, source, sink);
// Dinic's Algorithm
let flow = dinic(&graph, source, sink);
// Push-Relabel Algorithm
let flow = push_relabel(&graph, source, sink);
```
## Parallel Algorithms
Enable `parallel` feature to use parallel algorithms:
```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["parallel"] }
```
```rust
use god_gragh::algorithms::parallel;
// Parallel BFS
let layers = parallel::bfs_parallel(&graph, start_node);
// Parallel PageRank
let ranks = parallel::pagerank_parallel(&graph, 0.85, 20);
// Parallel Connected Components
let components = parallel::connected_components_parallel(&graph);
```
### SIMD Optimization
Enable `simd` feature for SIMD vectorization (supports stable Rust):
```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["simd"] }
```
```rust
use god_gragh::algorithms::parallel;
// SIMD-accelerated PageRank
#[cfg(feature = "simd")]
let ranks = parallel::par_pagerank_simd(&graph, 0.85, 20);
// SIMD-accelerated Degree Centrality
#[cfg(feature = "simd")]
let centrality = parallel::par_degree_centrality_simd(&graph);
```
**Implementation Details**: Uses `wide::f64x4` type for 4-way parallel floating-point operations, automatically leveraging CPU SIMD instruction sets (SSE/AVX/AVX-512).
## Tensor & GNN Support
Enable tensor features for Graph Neural Network workflows:
```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["tensor", "tensor-gnn"] }
```
### Basic Tensor Operations
```rust
use god_gragh::tensor::{DenseTensor, TensorBase, TensorOps};
// Create tensors
let a = DenseTensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = DenseTensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);
// Matrix multiplication
let c = a.matmul(&b);
// Transpose
let t = a.transpose(None);
// Normalize
let norm = a.normalize();
```
### Graph-Tensor Conversion
```rust
use god_gragh::graph::Graph;
use god_gragh::tensor::GraphTensorExt;
// Create a graph with vector node features
let mut graph = Graph::<Vec<f64>, f64>::directed();
let n0 = graph.add_node(vec![1.0, 0.0]).unwrap();
let n1 = graph.add_node(vec![0.0, 1.0]).unwrap();
let n2 = graph.add_node(vec![1.0, 1.0]).unwrap();
let _ = graph.add_edge(n0, n1, 1.0);
let _ = graph.add_edge(n1, n2, 1.0);
let _ = graph.add_edge(n2, n0, 1.0);
// Convert to tensor representation
let (features, adjacency) = graph.to_tensor_representation().unwrap();
assert_eq!(features.shape(), &[3, 2]);
assert_eq!(adjacency.num_nodes, 3);
```
### GNN Layers
> **Important**: God-Graph GNN modules are **inference-only** (forward pass only).
> For training workflows, integrate with external autograd libraries:
> - **[dfdx](https://crates.io/crates/dfdx)**: Deep learning framework with CUDA support
> - **[Candle](https://github.com/huggingface/candle)**: HuggingFace's lightweight tensor library
> - **[tch-rs](https://crates.io/crates/tch-rs)**: Rust bindings for PyTorch
#### Inference Example (Recommended Use Case)
```rust
use god_gragh::tensor::gnn::{GCNConv, GATConv, GraphSAGE, MessagePassingLayer};
// Create GCN layer
let gcn = GCNConv::new(64, 64);
// Create GAT layer (multi-head attention)
let gat = GATConv::new(
64, // in_features
64, // out_features
4, // num_heads
);
// Create GraphSAGE layer
let graphsage = GraphSAGE::new(
64, // in_features
32, // out_features
10, // num_samples
);
// Forward pass (inference only)
let h1 = gcn.forward(&features, &adjacency);
let h2 = gat.forward(&h1, &edge_index);
let output = graphsage.forward(&h2, &edge_index);
```
#### Training Integration Example (with dfdx)
For complete GNN training, integrate with dfdx:
```rust
// Pseudo-code: Integrate god-gragh GNN with dfdx autograd
use dfdx::prelude::*;
use god_gragh::tensor::gnn::GCNConv;
// 1. Use god-gragh for graph structure and forward pass
let gcn = GCNConv::new(64, 64);
let output = gcn.forward(&features, &adjacency);
// 2. Convert to dfdx tensor for autograd
// let dfdx_tensor = Tensor1D::from(output.data());
// 3. Define loss and optimizer (dfdx)
// let loss = cross_entropy_loss(&dfdx_tensor, &labels);
// let mut optimizer = Adam::new(model.parameters(), lr=0.001);
// 4. Training loop
// for epoch in 0..num_epochs {
// optimizer.zero_grad();
// let loss = forward_pass(&graph, &labels);
// optimizer.backward(&loss);
// optimizer.step();
// }
```
**See**: [examples/differentiable_graph.rs](examples/differentiable_graph.rs) for an example of differentiable graph structures and gradient-based optimization.
### Memory Pool Optimization
```rust
use god_gragh::tensor::{TensorPool, PoolConfig};
// Create a tensor pool
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);
// Acquire tensor from pool (automatically zeroed)
let tensor = pool.acquire(vec![100, 100]);
// Automatically returned to pool when dropped
drop(tensor);
```
**Benefits**:
- **Memory Reuse**: Reduces allocation overhead in iterative algorithms (PageRank, GNN training) by **80-90%**
- **Automatic Recycling**: `PooledTensor` automatically returns to pool on Drop
- **Gradient Checkpointing**: `GradientCheckpoint` reduces memory usage during backpropagation by **40-60%**
### Memory Pool Benchmark Results
**Latest benchmarks** (run on Linux, Rust 1.85, March 2026):
| Iterative (without pool) | 850.84 µs | N/A | Baseline |
| Iterative (with pool) | 127.76 µs | **98-100%** | **98-99.9%** |
| GNN Iteration | 31.93 µs | **96-99%** | **96-99%** |
| MatMul Temporaries | 42.15 µs | **95-98%** | **95-98%** |
| Small Tensors (16x16) | 6.89 µs | **98%+** | **98%+** |
| Large Tensors (512x512) | 17.36 µs | **95%+** | **95%+** |
| Sequential Alloc/Dealloc | 34.71 µs | **98%+** | **98%+** |
| Warm Pool (preallocated) | 34.39 µs | **100%** | **100%** |
| Cold Pool (no prealloc) | 35.32 µs | **98%+** | **98%+** |
**Key Findings**:
1. **Allocation Reduction**: The memory pool achieves **98-99.9% reduction** in new system allocations for iterative workloads, validating the "80-90% reduction" claim with actual measurements showing even better results.
2. **Performance Speedup**: For iterative allocation patterns (50 iterations of 128x128 tensors), the pool achieves **6.7x speedup** (850.84 µs → 127.76 µs).
3. **Preallocation Benefit**: Warm pools with preallocation achieve near-perfect hit rates (100%), eliminating allocation overhead entirely.
4. **Typical Workloads**:
- **GNN Forward Pass**: 96-99% allocation reduction (hidden state temporaries)
- **Attention QKV Projections**: 95-98% reduction (sequential matmul temporaries)
- **Batch Processing**: 98%+ reduction with preallocated pools
**Pool Statistics from Benchmarks**:
```
=== Iterative Pool Stats ===
Total allocations: 204800
Pool hits (reuses): 204799
Pool misses (new allocs): 1
Hit rate: 100.00%
Allocation reduction: 100.00%
=== GNN Iteration Pool Stats ===
Total allocations: 300
Pool hits (reuses): 297
Pool misses (new allocs): 3
Hit rate: 99.00%
Allocation reduction: 99.00%
=== MatMul Temporaries Pool Stats ===
Total allocations: 60
Pool hits (reuses): 57
Pool misses (new allocs): 3
Hit rate: 95.00%
Allocation reduction: 95.00%
```
**Note**: The memory pool provides maximum benefit in:
1. **Iterative algorithms** (PageRank, GNN message passing) - 98-99.9% reduction
2. **Sequential temporaries** (QKV projections) - 95-98% reduction
3. **Batch processing** with preallocated pools - 100% hit rate achievable
Run memory pool benchmarks:
```bash
cargo bench --features "tensor tensor-pool" --bench memory_pool_reduction
```
## Transformer & LLM Inference
Enable the `transformer` feature for LLaMA/Mistral model inference:
```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer"] }
```
### Model Loading
```rust
use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};
// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
"meta-llama/Llama-2-7b-hf",
None, // token
).unwrap();
// Build the model
let model = LlamaModel::new(
config,
weights.embed_tokens,
weights.layers,
weights.norm,
weights.lm_head,
).unwrap();
// Create text generator
let generator = TextGenerator::new(&model);
```
---
## 🔬 真实模型验证:TinyLlama-1.1B
God-Graph 已完整支持真实 LLM 模型的加载、验证和优化。本项目使用 **TinyLlama-1.1B** 进行端到端验证。
### 模型下载
```bash
# 使用 HuggingFace Hub 下载 TinyLlama-1.1B
pip install huggingface_hub
python scripts/download_tinyllama.py
# 模型将下载到 models/tinyllama/model.safetensors
```
### 加载真实模型
```rust
use god_gragh::transformer::optimization::{ModelSwitch, OperatorType};
use god_gragh::graph::traits::{GraphBase, GraphQuery};
// 从 safetensors 加载 TinyLlama
let graph = ModelSwitch::load_from_safetensors("models/tinyllama/model.safetensors")?;
// 验证模型结构
let node_count = graph.node_count();
let edge_count = graph.edge_count();
println!("TinyLlama-1.1B 加载成功:");
println!(" - 节点数:{}", node_count);
println!(" - 边数(权重): {}", edge_count);
// 验证所有权重有效(无 NaN/Inf)
for edge_ref in graph.edges() {
let weight = edge_ref.data();
assert!(weight.data.iter().all(|&v| v.is_finite()), "权重包含非有限值");
}
println!("✓ 所有权重有效(无 NaN/Inf)");
```
### 李群正交化验证
对真实模型权重进行正交化处理,验证数值稳定性:
```rust
use god_gragh::transformer::optimization::lie_group::{
orthogonalize_weights_in_place, LieGroupConfig
};
// 配置正交化参数
let config = LieGroupConfig::default()
.with_cayley(true) // 使用 Cayley 变换
.with_block_size(32); // SO(32) 块大小
// 执行原地正交化(零拷贝)
let errors = orthogonalize_weights_in_place(&config, &mut graph)?;
// 验证正交化效果
let avg_error = errors.iter().sum::<f64>() / errors.len() as f64;
println!("正交化结果:");
println!(" - 平均误差:{:.2e}", avg_error);
println!(" - 最大误差:{:.2e}", errors.iter().fold(0.0f64, f64::max));
// 验证:正交化误差应 < 1e-8
assert!(avg_error < 1e-8, "正交化误差过大");
println!("✓ 正交化成功(误差 < 1e-8)");
```
### 张量环压缩验证
对真实模型进行张量环压缩,验证压缩效果:
```rust
use god_gragh::transformer::optimization::{
TensorRingCompressor, CompressionConfig
};
// 配置压缩参数
let config = CompressionConfig::default()
.with_target_rank(16) // 目标秩
.with_min_rank(4); // 最小秩
// 执行压缩
let compressor = TensorRingCompressor::new(&config);
let report = compressor.compress_graph(&graph)?;
println!("张量环压缩报告:");
println!(" - 原始参数量:{:.2}M", report.original_params / 1e6);
println!(" - 压缩后参数量:{:.2}M", report.compressed_params / 1e6);
println!(" - 压缩比:{:.2}x", report.compression_ratio);
println!(" - 重建误差:{:.2e}", report.reconstruction_error);
// 验证:压缩比应 < 0.5(至少 2 倍压缩)
assert!(report.compression_ratio < 0.5, "压缩比不达标");
println!("✓ 压缩成功(压缩比 < 0.5)");
```
### 测试命令
运行完整验证测试:
```bash
# 运行所有真实模型验证测试
cargo test --features "safetensors tensor" real_model -- --nocapture
# 运行正交化测试
cargo test --features "safetensors tensor" test_tinyllama_orthogonalization -- --nocapture
# 运行压缩测试
cargo test --features "safetensors tensor" test_tinyllama_tensor_ring -- --nocapture
```
### 验证结果
**测试文件**: `tests/real_model_validation.rs`
| `test_load_tinyllama_model` | ✅ 通过 | 模型加载验证 |
| `test_tinyllama_orthogonalization` | ✅ 通过 | 正交化误差 < 1e-8 |
| `test_tinyllama_tensor_ring` | ✅ 通过 | 压缩比 < 0.5 |
| `test_tinyllama_weight_validity` | ✅ 通过 | 无 NaN/Inf |
**关键指标**:
- 正交化误差:**2.04e-14** (远低于 1e-8 阈值)
- 压缩比:**0.12x - 0.25x** (取决于秩选择)
- 重建误差:**< 1e-6** (数值精度保证)
详见 [tests/real_model_validation.rs](tests/real_model_validation.rs) 和 [CAD-LLM 1B 验证报告](CAD_LLM_1B_VALIDATION_REPORT.md)。
---
### Model Loading (English)
```rust
use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};
// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
"meta-llama/Llama-2-7b-hf",
None, // token
).unwrap();
// Build the model
let model = LlamaModel::new(
config,
weights.embed_tokens,
weights.layers,
weights.norm,
weights.lm_head,
).unwrap();
// Create text generator
let generator = TextGenerator::new(&model);
```
### Text Generation
```rust
use god_gragh::transformer::{GenerationConfig, SamplingMode};
// Configure generation
let config = GenerationConfig::new()
.with_max_length(512)
.with_temperature(0.8)
.with_top_p(0.9)
.with_top_k(40)
.with_repetition_penalty(1.1)
.with_sampling_mode(SamplingMode::TopPTopK);
// Generate text
let prompt = "Once upon a time";
let result = generator.generate(prompt, &config).unwrap();
println!("Generated: {}", result.text);
println!("Tokens: {} in {}ms", result.num_tokens, result.generate_time_ms);
```
### KV Cache Optimization
```rust
use god_gragh::transformer::kv_cache::{CacheConfig, KvCache};
// Configure KV cache
let config = CacheConfig::new()
.with_max_batch_size(1)
.with_max_seq_len(2048)
.with_dtype("f32");
// Cache is managed internally during generation
// Supports incremental decoding and multi-turn dialogue
```
### Batch Inference with SIMD
Enable `simd` feature for SIMD-accelerated batch inference:
```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer", "simd"] }
```
```rust
use god_gragh::transformer::batch::{BatchGenerator, BatchRequest};
// Create batch generator
let mut batch_gen = BatchGenerator::new(&model);
// Add multiple requests
batch_gen.add_request("Hello, how are you?", 1);
batch_gen.add_request("What is Rust?", 2);
batch_gen.add_request("Explain quantum computing", 3);
// Process batch with SIMD acceleration
let results = batch_gen.generate_batch(&config).unwrap();
```
### Supported Models
- **LLaMA / Llama-2 / Llama-3**: Meta's open language models
- **Mistral**: Mistral AI's efficient models
- **Gemma**: Google's lightweight models
- **Qwen**: Alibaba's multilingual models
**Note**: Model weights must be in `.safetensors` format. Use `load_safetensors` or `load_from_hf_hub` for automatic conversion.
### Examples
See the following examples for complete workflows:
- [`examples/llm_model_loader.rs`](examples/llm_model_loader.rs) - Loading models from HuggingFace
- [`examples/llm_text_gen.rs`](examples/llm_text_gen.rs) - End-to-end text generation
- [`examples/llm_batch_simd.rs`](examples/llm_batch_simd.rs) - SIMD-accelerated batch inference
Run examples:
```bash
# Text generation demo
cargo run --example llm_text_gen --features transformer
# Batch inference with SIMD
cargo run --example llm_batch_simd --features "transformer,simd"
```
### Documentation
- [Transformer Module Guide](docs/transformer_guide.md) - API reference and usage
- [Transformer Tutorial](docs/transformer_tutorial.md) - Step-by-step tutorial
- [Enhancements Report](docs/TRANSFORMER_ENHANCEMENTS_REPORT.md) - Implementation details
## Random Graph Generation
```rust
use god_gragh::generators::{
erdos_renyi_graph, barabasi_albert_graph, watts_strogatz_graph,
complete_graph, grid_graph, tree_graph
};
// Erdős-Rényi Random Graph G(n, p)
let graph = erdos_renyi_graph::<String>(100, 0.1, true, 42);
// Barabási-Albert Preferential Attachment Model
let graph = barabasi_albert_graph::<String>(100, 3);
// Watts-Strogatz Small-World Network
let graph = watts_strogatz_graph::<String>(100, 4, 0.1);
// Complete Graph
let graph = complete_graph::<String, f64>(10);
// Grid Graph
let graph = grid_graph::<String, f64>(5, 5);
// Tree
let graph = tree_graph::<String, f64>(3, 100);
```
## Graph Export
### DOT/Graphviz Format
```rust
use god_gragh::export::{to_dot, to_svg, to_adjacency_list, to_edge_list};
// Export to DOT format (Graphviz)
let dot = to_dot(&graph);
std::fs::write("graph.dot", dot)?;
// Generate visualization:
// bash: dot -Tpng graph.dot -o graph.png
```
### SVG Visualization
```rust
use god_gragh::export::svg::{SvgOptions, LayoutAlgorithm};
// Export to SVG format with custom options
let options = SvgOptions::new()
.with_size(800, 600)
.with_node_radius(25.0)
.with_layout(LayoutAlgorithm::ForceDirected);
let svg = to_svg(&graph, &options);
std::fs::write("graph.svg", svg)?;
// View in browser using examples/graph_viewer.html
```
**Layout Algorithms**:
- **Force-Directed**: Physics-based layout with node repulsion and edge attraction
- **Circular**: Nodes arranged in a circle
- **Hierarchical**: Layered layout based on topological sort
**Interactive Viewer**: Open `examples/graph_viewer.html` in browser to:
- Drag and drop SVG files
- Zoom and pan
- Adjust node/edge styles in real-time
- View node list
### Adjacency List & Edge List
```rust
// Export as adjacency list
let adj_list = to_adjacency_list(&graph);
// Export as edge list
let edge_list = to_edge_list(&graph);
```
## Feature Flags
### Basic Features
| `std` | Standard library support (enabled by default) | - |
| `parallel` | Parallel algorithms | rayon, crossbeam-queue |
| `serde` | Serialization support | serde |
| `dot` | DOT format export | - |
| `simd` | SIMD vectorization (experimental, stable Rust) | wide |
| `matrix` | Matrix representation | nalgebra |
| `rand` | Random graph generation | rand, rand_chacha |
| `unstable` | Nightly Rust features | - |
### Tensor Features
| `tensor` | Tensor core support (ndarray backend) | ndarray |
| `tensor-sparse` | Sparse tensor formats (COO, CSR, BSR) | tensor |
| `tensor-gpu` | GPU acceleration (requires CUDA) | tensor, dfdx |
| `tensor-candle` | Candle backend (Hugging Face) | tensor, candle-core |
| `tensor-autograd` | Automatic differentiation | tensor, dfdx |
| `tensor-serde` | Tensor serialization | tensor, serde |
| `tensor-gnn` | GNN layers (GCN, GAT, GraphSAGE) | tensor, tensor-sparse, rand_distr |
| `tensor-pool` | Memory pool optimization | tensor, bitvec |
| `tensor-batch` | Batch graph processing | tensor, tensor-sparse |
### Meta-Features (Recommended)
| `tensor-full` | All tensor features | tensor, tensor-sparse, tensor-gnn, tensor-pool, tensor-batch |
| `tensor-inference` | GNN inference only | tensor, tensor-sparse, tensor-gnn |
| `tensor-ml` | ML training support | tensor, tensor-sparse, tensor-gnn, tensor-autograd, tensor-pool |
### Transformer Features
| `transformer` | Transformer/LLM inference | serde_json, memmap2, regex |
| `safetensors` | Safetensors model loading | safetensors |
| `hf-hub` | HuggingFace Hub integration | hf-hub, tokio |
| `simd` | SIMD acceleration for batch inference | wide |
**Note**: For complete Transformer support, use `--features "transformer,safetensors,simd"`. See [Transformer & LLM Inference](#transformer--llm-inference) for details.
## Comparison with petgraph
| Memory Layout | Bucket-based adjacency list + Arena-style slots | Adjacency list |
| Incremental Updates | ✅ O(1) | ❌ Requires rebuild |
| Stable Indices | ✅ Generation counting | ✅ Stable Graph |
| Parallel Algorithms | ✅ Built-in (5+) | ❌ |
| Cache Optimization | ✅ 64-byte alignment | ❌ |
| SIMD Vectorization | ✅ wide::f64x4 | ❌ |
| Tensor/GNN Support | ✅ Multi-backend | ❌ |
| **Transformer/LLM** | ✅ LLaMA/Mistral inference | ❌ |
| API Design | Generic traits | Concrete types |
| Documentation | 🌱 Growing | 🌳 Mature |
| Community Maturity | 🌱 Growing | 🌳 Mature |
**God-Graph Advantages**:
1. Generation-indexed stability prevents ABA problems
2. Bucket-based adjacency list supports O(1) incremental updates
3. Built-in parallel algorithm suite with proven speedups
4. Cache-optimized memory layout (64-byte alignment, software prefetching)
5. SIMD vectorization for batch computations
6. Integrated tensor/GNN support for machine learning workflows
**petgraph Advantages**:
1. Mature community, production-proven
2. Comprehensive documentation
3. More algorithm variants
## Performance Benchmarks
Detailed performance data available in [**Performance Report**](docs/performance.md).
Benchmark results on 8-core CPU:
| PageRank | 1,000 nodes | 53.9ms | 668µs | **80.7x** |
| DFS | 50K nodes | 9.7ms | 1.3ms | **7.5x** |
| Connected Components | 2,000 nodes | - | 357.8µs | - |
| Degree Centrality | 5,000 nodes | - | 146µs | - |
### SIMD Performance (Estimated)
| 100 nodes | 2.1ms | 280µs | ~150µs | 14x |
| 1,000 nodes | 210ms | 2.8ms | ~1.5ms | 140x |
| 5,000 nodes | 5.2s | 68ms | ~40ms | 130x |
*Note: SIMD performance depends on CPU instruction set support (AVX2/AVX-512)*
### Memory Pool Performance
The tensor memory pool reduces allocation overhead by reusing pre-allocated memory, achieving high reuse ratios for iterative algorithms.
#### Benchmark Results (Actual Measurements)
| Iterative allocation (50× 128×128) | 847.91 µs | 2.57 µs/iter | **~99.7% faster per iteration** |
| GNN iteration (10 steps) | N/A | 10.85 µs | **Hit rate: 99.89%** |
| Matrix multiplication temporaries | N/A | 4.14 µs | **Hit rate: 99.93%** |
| Small tensor allocation (16×16) | N/A | 694.86 ns | **Hit rate: 99.95%** |
| Large tensor allocation (512×512) | N/A | 48.12 µs | **Hit rate: 99.93%** |
| Sequential alloc/dealloc (50×) | N/A | 35.22 µs | **Hit rate: 100.00%** |
#### Pool Hit Rate by Workload
| Iterative (single tensor) | 0% → 99.90% (16 steps) | **99.99%** |
| GNN iteration | 90% → 99.99% (10 steps) | **99.89%** |
| Matrix multiplication | 98.33% → 100% | **99.93%** |
| Batch size 10 | 90% → 100% | **99.99%** |
| Batch size 25 | 96% → 100% | **99.99%** |
| Batch size 50 | 98% → 100% | **99.99%** |
| Batch size 100 | 99% → 100% | **99.99%** |
#### Pre-allocation Impact (Warm vs Cold Pool)
| Cold pool (no pre-alloc) | 0% | 100% | 696.89 ns |
| Warm pool (pre-allocated) | 0% → 100% (faster) | 100% | 696.83 ns |
**Key Metrics**:
- **Reuse Ratio**: >99% for iterative workloads (measured via `pool.hit_rate()`)
- **Allocation Reduction**: 80-90% fewer system allocations after warm-up
- **Memory Throughput**: Pre-allocation eliminates runtime allocation latency
- **Warm-up Time**: ~16 iterations to reach 99.9%+ hit rate
- **Steady-state Latency**: ~695 ns per tensor acquire/release
**Note**: The memory pool shows higher absolute time in micro-benchmarks due to pool management overhead, but provides significant benefits in real-world iterative algorithms by eliminating repeated system allocations and improving cache locality.
**Usage Example**:
```rust
use god_gragh::tensor::pool::{TensorPool, PoolConfig};
// Create pool with pre-allocation
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);
// Acquire tensors (reuses memory after first allocation)
for _ in 0..50 {
let tensor = pool.acquire(vec![128, 128]);
// ... use tensor ...
drop(tensor); // Automatically returns to pool
}
// Check statistics
let stats = pool.stats();
println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
println!("New allocations: {}", stats.pool_misses);
```
Run benchmarks:
```bash
cargo bench --features tensor,tensor-pool --bench tensor_pool
```
## Test Coverage
This project uses `cargo-tarpaulin` for coverage measurement, targeting **80%+** coverage.
### Generate Coverage Report
```bash
# Install cargo-tarpaulin
cargo install cargo-tarpaulin
# Generate HTML coverage report
cargo tarpaulin --all-features --out Html --output-dir coverage
# View report
open coverage/tarpaulin-report.html # macOS
xdg-open coverage/tarpaulin-report.html # Linux
```
### Current Coverage
- **Overall Coverage**: 66.64% (1560/2341 lines)
- **Unit Tests**: 82 passed
- **Integration Tests**: 18 passed
- **Property Tests**: 15 passed
- **Doc Tests**: 27 passed (1 ignored)
- **Total**: 142 tests, 100% passing
See [coverage/tarpaulin-report.html](coverage/tarpaulin-report.html) for details.
## Development Roadmap
See [ROADMAP.json](ROADMAP.json) for detailed roadmap.
### Version History
- [x] v0.1.0-alpha: Core graph structure, basic CRUD, DFS/BFS
- [x] v0.2.0-alpha: Complete algorithm suite, random graph generators
- [x] v0.3.0-beta: Performance reports, migration guide, parallel algorithms
- [x] **v0.4.0-beta**: Tensor/GNN integration, memory pool optimization, differentiable graph
- [x] **v0.4.2-beta**: **Transformer/LLM inference**, LLaMA/Mistral support, KV Cache optimization, text generation
- [ ] v0.5.0-rc: Serde support, API stabilization
- [ ] v1.0.0-stable: Production-ready
### Upcoming Features
- [ ] Improve test coverage to 80%+
- [ ] GitHub Pages documentation site
- [ ] crates.io release
- [ ] Graph-Tensor deep integration (Phase 4)
- [ ] Automatic differentiation support (Phase 5)
- [ ] GPU acceleration with Dfdx/Candle backends (Phase 6)
- [ ] Multi-modal models (Llava, etc.)
## Contributing
Contributions are welcome! Please follow these steps:
1. Fork the project
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
Please ensure:
- Code passes `cargo clippy` and `cargo fmt`
- Add appropriate tests
- Update documentation
## Known Issues
1. **Coverage Gap**: Current 66.64%, below 80% target
- Main gaps: Community detection, flow algorithms, matching algorithms
- Plan: Add targeted tests in v0.4.0
2. **Force-Directed Layout**: Current implementation is simplified
- 50 iterations, fixed parameters
- Plan: Configurable iterations and physics parameters in v0.4.0
3. **par_dijkstra**: Marked as experimental in v0.3.0-beta
- Known issues with bucket index calculation and potential deadlocks
- Plan: Refactor in v0.4.0
## License
This project is dual-licensed: MIT or Apache-2.0 (at your option).
See [LICENSE-MIT](LICENSE-MIT) and [LICENSE-APACHE](LICENSE-APACHE) for details.
## Acknowledgments
- [petgraph](https://github.com/petgraph/petgraph) - Pioneer of Rust graph libraries
- [rayon](https://github.com/rayon-rs/rayon) - Data parallelism library
- [Graphviz](https://graphviz.org/) - Graph visualization tool
- [wide](https://crates.io/crates/wide) - SIMD math library for stable Rust
- [ndarray](https://crates.io/crates/ndarray) - N-dimensional arrays
- [dfdx](https://crates.io/crates/dfdx) - Deep learning framework with CUDA support
- [Candle](https://github.com/huggingface/candle) - HuggingFace's lightweight tensor library
- [Hugging Face](https://huggingface.co) - Open-source AI community and model hub
- [Safetensors](https://github.com/huggingface/safetensors) - Safe tensor serialization format
- [LLaMA](https://ai.meta.com/llama/) - Meta's open language models
- [Mistral](https://mistral.ai) - Mistral AI's efficient language models
## Contact
- Issue Reports: [GitHub Issues](https://github.com/silverenternal/god-graph/issues)
- Discussions: [GitHub Discussions](https://github.com/silverenternal/god-graph/discussions)
- Documentation: [docs.rs/god-gragh](https://docs.rs/god-gragh)