god-gragh 0.5.0

# God-Graph

[![Crates.io](https://img.shields.io/crates/v/god-gragh.svg)](https://crates.io/crates/god-gragh)
[![Documentation](https://docs.rs/god-gragh/badge.svg)](https://docs.rs/god-gragh)
[![License](https://img.shields.io/crates/l/god-gragh.svg)](https://github.com/silverenternal/god-graph?tab=License-1-ov-file#readme)
[![Build Status](https://github.com/silverenternal/god-graph/workflows/CI/badge.svg)](https://github.com/silverenternal/god-graph/actions)
[![Coverage Status](https://codecov.io/gh/silverenternal/god-graph/branch/main/graph/badge.svg)](https://codecov.io/gh/silverenternal/god-graph)

> **God-Graph 是一个 LLM 白盒分析工具——把 LLM 从黑盒变成可编辑的白盒**
>
> 核心创新：**DifferentiableGraph（可微图结构）**——用梯度下降优化神经网络架构，支持动态注意力剪枝、拓扑缺陷检测、自动架构搜索。

---

## 🎯 核心定位

**God-Graph 不是**：
- ❌ LLM 推理引擎（打不过 `llama.cpp`）
- ❌ GNN 训练框架（打不过 DGL/PyG）
- ❌ 通用图算法库（`petgraph` 更成熟）

**God-Graph 是**：
- ✅ **LLM 白盒分析工具**——可以检查/修改模型拓扑结构
- ✅ **可微图结构引擎**——用梯度下降优化神经网络架构（DifferentiableGraph）
- ✅ **拓扑缺陷检测器**——发现梯度阻断、孤立节点、缺失残差连接
- ✅ **数学层面优化器**——李群正交化、张量环压缩

**一句话总结**：God-Graph 用 CAD 软件的设计哲学优化 LLM——检查"表面断裂"（孤立节点）、"非流形几何"（梯度阻断）、"尺寸约束"（注意力头平衡），并首创**可微图结构**实现梯度引导的架构搜索。

---

## 📚 核心文档

**完整文档导航**: [docs/README.md](docs/README.md)

### 快速链接

| 文档 | 说明 |
|------|------|
| [**快速开始**](docs/user-guide/getting-started.md) | 5 分钟上手 God-Graph |
| [**DifferentiableGraph 教程**](docs/user-guide/differentiable-graph.md) | 可微图结构完整教程 |
| [**设计哲学**](docs/internals/cad-design.md) | 为什么需要 CAD-LLM 范式迁移 |
| [**架构指南**](docs/internals/architecture.md) | 模块职责和工作流 |
| [**性能报告**](docs/reports/performance.md) | 并行算法和 SIMD 性能数据 |
| [**实现状态**](docs/reports/implementation-status.md) | 功能完成度和路线图 |
| [**TinyLlama 验证**](docs/reports/validation.md) | 真实模型端到端验证 |

---

## ⚡ DifferentiableGraph 快速开始

**DifferentiableGraph 是 God-Graph 的核心创新**——它将图结构从"静态容器"变为"可微分的计算本身"，支持用梯度下降优化神经网络架构。

### 核心应用场景

1. **动态注意力剪枝**：梯度引导剪除弱注意力边，减少 30-50% 冗余连接
2. **拓扑缺陷检测**：自动发现孤立节点、梯度阻断、缺失残差连接
3. **神经架构搜索**：让模型自己学习最优残差连接和注意力模式
4. **权重编辑**：李群正交化保证数值稳定性，支持精确的权重修改

### 5 分钟上手示例

```rust
use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig, ThresholdEditPolicy};

// 1. 从标准 Transformer 构建可微图
let mut graph = build_mini_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);

// 2. 定义目标函数（注意力熵 + 稀疏性正则）
let loss_fn = |g: &DifferentiableGraph| {
    g.entropy_loss() + 0.1 * g.sparsity_loss()
};

// 3. 梯度下降优化结构
for step in 0..100 {
    let loss = loss_fn(&diff_graph);
    let grads = diff_graph.compute_structure_gradients(loss);
    diff_graph.update_structure(&grads, 0.01);
    
    if step % 10 == 0 {
        println!("Step {}: loss={:.4}", step, loss);
    }
}

// 4. 导出剪枝后的图
let policy = ThresholdEditPolicy::new(0.5);
let pruned_graph = diff_graph.discretize(&policy);
println!("剪枝了 {} 条弱注意力边", pruned_graph.num_pruned_edges());
```

### 完整示例

| 示例 | 说明 | 运行命令 |
|------|------|----------|
| [可微注意力剪枝](examples/differentiable_graph.rs) | 梯度引导剪除弱边 | `cargo run --example differentiable_graph --features tensor` |
| [拓扑缺陷检测](examples/cad_llm_validate_1b.rs) | 检测模型拓扑问题 | `cargo run --example cad_llm_validate_1b --features transformer` |
| [李群正交化](examples/cad_llm_orthogonalize.rs) | 权重正交化稳定性 | `cargo run --example cad_llm_orthogonalize --features transformer` |
| [张量环压缩](examples/cad_llm_tensor_ring.rs) | 模型压缩 workflow | `cargo run --example cad_llm_tensor_ring --features transformer` |

详见 [DifferentiableGraph 完整教程](docs/user-guide/differentiable-graph.md)。

## 🚀 快速开始

### 安装

```toml
[dependencies]
god-gragh = "0.5.0-alpha"
```

### 基础用法：图数据结构和算法

```rust
use god_gragh::graph::Graph;
use god_gragh::algorithms::traversal::{bfs, dfs};

// 创建图
let mut graph = Graph::<String, f64>::directed();
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let _ = graph.add_edge(a, b, 1.0);

// BFS 遍历
bfs(&graph, a, |node, _depth| {
    println!("访问：{}", node.data());
    true
});
```

### 高级用法：LLM 拓扑优化

```rust
use god_gragh::transformer::optimization::{
    ModelSwitch, CadStyleEditor, TensorRingCompressor
};

// 1. 从 Safetensors 加载模型
let mut graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 2. 检测拓扑缺陷
let mut editor = CadStyleEditor::new(&mut graph);
let defects = editor.detect_defects()?;
println!("发现 {} 个缺陷", defects.len());

// 3. 张量环压缩
let compressor = TensorRingCompressor::default();
let report = compressor.compress_graph(&graph)?;
println!("压缩比：{:.2}x", report.compression_ratio);

// 4. 导出优化后的模型到 Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;
```

### ModelSwitch 双向转换

**ModelSwitch** 提供 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换：

```rust
use god_gragh::transformer::optimization::ModelSwitch;

// 加载：Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 验证拓扑
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效：{}", topology_report.is_valid);

// 验证权重（比较两个图的权重差异）
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异：{:.6e}", weight_diff.max_l2_diff);

// 导出：GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;
```

**功能特性**：
- ✅ 支持 F32/F64/F16 数据类型
- ✅ 权重精度验证（L2 范数比较）
- ✅ 拓扑完整性检查
- ✅ 往返精度损失 < 1e-5

详见 [ModelSwitch 示例](examples/cad_llm_switch.rs)。

---

## 🔬 核心功能

### 1. ModelSwitch 双向转换 ⭐ 核心功能

**ModelSwitch** 实现 HuggingFace Safetensors 和 GodGraph 之间的双向无损转换，是 LLM 白盒分析的工作流基础。

```rust
use god_gragh::transformer::optimization::ModelSwitch;

// 1. 加载：Safetensors → GodGraph
let graph = ModelSwitch::load_from_safetensors("model.safetensors")?;

// 2. 验证拓扑完整性
let topology_report = ModelSwitch::validate_topology(&graph)?;
println!("拓扑有效：{}", topology_report.is_valid);
println!("连通分量：{}", topology_report.connected_components);
println!("是 DAG: {}", topology_report.is_dag);

// 3. 验证权重精度（比较两个图的权重差异）
let weight_diff = ModelSwitch::verify_weights(&original_graph, &modified_graph)?;
println!("最大 L2 差异：{:.6e}", weight_diff.max_l2_diff);
println!("平均 L2 差异：{:.6e}", weight_diff.avg_l2_diff);

// 4. 导出：GodGraph → Safetensors
ModelSwitch::save_to_safetensors(&graph, "optimized.safetensors")?;
```

**核心功能**：
- **双向转换**：Safetensors ↔ GodGraph 无损转换
- **数据类型支持**：F32、F64、F16 自动转换
- **拓扑验证**：检查连通性、环、孤立节点
- **权重验证**：L2 范数比较，精度损失 < 1e-5
- **算子推断**：根据权重名称自动推断算子类型（Attention、MLP、Norm 等）

**运行示例**：
```bash
cargo run --example cad_llm_switch --features safetensors
```

详见 [ModelSwitch 示例](examples/cad_llm_switch.rs)。

---

### 2. 可微图结构 (DifferentiableGraph) ⭐ 核心创新

**这是 God-Graph 的原创性贡献**——将图结构从"静态容器"变为"可微分的计算本身"。

```rust
use god_gragh::tensor::differentiable::{DifferentiableGraph, GradientConfig};

// 1. 从标准 Transformer 构建可微图
let mut graph = build_transformer();
let config = GradientConfig::default().with_sparsity(0.1);
let mut diff_graph = DifferentiableGraph::from_graph(graph, config);

// 2. 梯度下降优化结构
for step in 0..100 {
    let loss = diff_graph.entropy_loss() + 0.1 * diff_graph.sparsity_loss();
    let grads = diff_graph.compute_structure_gradients(loss);
    diff_graph.update_structure(&grads, 0.01);
}

// 3. 导出剪枝后的图
let pruned = diff_graph.discretize(&ThresholdEditPolicy::new(0.5));
println!("剪枝比例：{:.2}%", pruned.pruned_ratio() * 100.0);
```

**核心技术**：
- **连续松弛**：将离散的边存在性转换为连续概率（0 到 1）
- **STE 估计器**：Straight-Through Estimator 实现离散 - 连续双向转换
- **Gumbel-Softmax**：可微分采样，支持梯度反向传播
- **李群正交化**：保证权重矩阵数值稳定性

**应用场景**：
- 动态注意力剪枝（减少 30-50% 冗余连接）
- 神经架构搜索（自动发现最优残差连接）
- 拓扑缺陷检测（孤立节点、梯度阻断）

详见 [DifferentiableGraph 教程](docs/differentiable_graph.md)。

---

### 3. 李群正交化 (Lie Group Orthogonalization)

用李群理论保证权重矩阵的正交性，提升数值稳定性。

```rust
use god_gragh::tensor::decomposition::{lie_exponential, is_orthogonal};

// so(n) 李代数 → SO(n) 李群
let algebra = DenseTensor::from_vec(
    vec![0.0, -0.1, 0.1, 0.0],
    vec![2, 2],
);

let rotation = lie_exponential(&algebra)?;
assert!(is_orthogonal(&rotation, 1e-5));
```

**数学原理**：指数映射 `exp: so(n) → SO(n)` 用 Padé 近似 + 缩放 - 平方算法实现。

---

### 4. 张量环压缩 (Tensor Ring Compression)

把高维张量表示为 3D 核心张量的环，减少参数量。

```rust
use god_gragh::transformer::optimization::TensorRingCompressor;

let compressor = TensorRingCompressor::default();
let ring = compressor.decompose(&weight_tensor)?;

println!("压缩比：{:.2}x", ring.compression_ratio());
```

**压缩比公式**：`(m × n) / (r₀×m×r₁ + r₁×n×r₂)`

---

### 5. 拓扑约束求解 (Topology Constraint Solving)

像 CAD 软件一样检查 LLM 的"几何完整性"。

```rust
use god_gragh::transformer::optimization::{CadStyleEditor, TopologyConstraint};

let mut editor = CadStyleEditor::new(&mut graph);

// 检测缺陷
let defects = editor.detect_defects()?;

// 添加约束
editor.add_constraint(TopologyConstraint::ResidualConnection {
    from_layer: "attention".to_string(),
    to_layer: "output".to_string(),
})?;

// 求解约束（自动修复）
editor.solve_constraints()?;
```

**缺陷类型**：孤立节点、disconnected 组件、梯度阻断、缺失残差连接。

---

### 6. GraphTransformer 显式注意力分析

**定位说明**：GraphTransformer 主要用于**可视化注意力拓扑**、**动态剪枝弱边**、**添加自定义连接**。对于高性能推理，建议转换为标准 LlamaModel。

```rust
use god_gragh::transformer::graph_transformer::GraphTransformer;

let mut transformer = GraphTransformer::new(12, 12, 768);
transformer.build_graph(&input_ids);

// 可视化注意力拓扑
let dot = transformer.to_dot();
std::fs::write("attention_graph.dot", dot)?;

// 剪枝弱注意力边（阈值=0.01）
let pruned = transformer.prune_weak_edges(0.01);
println!("剪枝 {} 条边", pruned);

// 添加自定义长程连接
transformer.add_skip_connection(layer_0, layer_11);
```

**核心优势**：
- 每条注意力边可单独访问/修改（黑盒推理引擎做不到）
- 支持动态拓扑编辑（传统静态图做不到）
- 可导出为 DOT/Graphviz 可视化

---

## 📊 性能数据

### 并行算法加速比

| 算法 | 规模 | 串行时间 | 并行时间 | 加速比 |
|------|------|----------|----------|--------|
| PageRank | 1,000 节点 | 53.9ms | 668µs | **80.7x** |
| DFS | 50K 节点 | 9.7ms | 1.3ms | **7.5x** |
| Connected Components | 2,000 节点 | - | 357.8µs | - |

详见 [性能报告](docs/performance.md)。

### SIMD 优化

| 图规模 | 串行 | 并行 | SIMD | 提升 |
|--------|------|------|------|------|
| 100 节点 | 2.1ms | 280µs | ~150µs | 14x |
| 1,000 节点 | 210ms | 2.8ms | ~1.5ms | 140x |

---

## 🏗️ 架构设计

### CAD-LLM 范式映射

| CAD 概念 | LLM 等价物 | GodGraph 实现 |
|----------|-----------|--------------|
| 表面断裂检查 | 孤立注意力头检测 | `connected_components` |
| 非流形几何检查 | 梯度阻断检测 | `topological_sort + path_analysis` |
| 尺寸约束 | 注意力头权重平衡 | `AttentionHeadBalance` 约束 |
| 平行约束 | 残差连接强制 | `ResidualConnection` 约束 |
| 装配约束 | 模块接口匹配 | `validate_assembly` |
| 零件替换 | 模块提取/替换 | `extract_module` / `replace_module` |

详见 [设计哲学](docs/CAD_LLM_DESIGN_PHILOSOPHY.md)。

---

## 📦 特性标志

### 基础特性

| 特性 | 说明 |
|------|------|
| `parallel` | 并行算法（Rayon） |
| `simd` | SIMD 向量化（wide::f64x4） |
| `tensor` | 张量核心支持（ndarray） |
| `tensor-sparse` | 稀疏张量格式（COO/CSR） |
| `tensor-gnn` | GNN 层（GCN/GAT/GraphSAGE） |

### LLM 优化特性

| 特性 | 说明 |
|------|------|
| `transformer` | Transformer 基础架构 |
| `safetensors` | Safetensors 模型加载 |
| `cad-llm` | CAD-LLM 拓扑优化（实验性） |

### 元特性（推荐）

| 元特性 | 包含 |
|--------|------|
| `tensor-full` | 所有张量功能 |
| `tensor-inference` | GNN 推理专用 |
| `llm` | 完整 LLM 支持 |

---

## 🔮 路线图

| 版本 | 状态 | 关键特性 |
|------|------|----------|
| v0.4.3-beta | ✅ 已发布 | 李群正交化、张量环压缩、拓扑约束 |
| **v0.5.0-alpha** | 🔥 **当前** | **DifferentiableGraph 可微图结构**、完整模型加载、真实模型验证 |
| v0.6.0-beta | 📅 计划 | 内存池基准测试、GraphTransformer 执行引擎 |
| v0.7.0-rc | 📅 计划 | 生产环境测试、crates.io 发布 |

### v0.5.0-alpha 核心特性

- **DifferentiableGraph（可微图结构）**：1421 行核心代码，支持梯度引导的架构搜索
- **真实模型验证**：TinyLlama-1.1B 端到端优化流程
- **图级正交化修复**：原地正交化接口（零拷贝），误差 < 1e-8
- **完整示例**：5 个端到端 DifferentiableGraph 示例

详见 [实现状态](LLM_PLAN_STATUS.md) 和 [todo.json](todo.json)。

---

## 🎓 目标用户

### 适合使用 God-Graph

✅ **LLM 研究人员**——想检查和修改模型拓扑结构
✅ **模型压缩工程师**——想用张量环/正交化压缩模型
✅ **QA 团队**——想验证模型完整性和数值稳定性
✅ **算法探索者**——想实验动态剪枝、稀疏注意力、架构搜索
✅ **白盒分析需求**——想理解 LLM 内部工作机制

### 不适合使用 God-Graph

❌ **应用开发者**——只想用 LLM 推理（用 `llama.cpp`）
❌ **训练工程师**——想训练新模型（用 PyTorch/JAX）
❌ **GPU 加速需求**——需要 CUDA 推理（用 `candle` 或 `vllm`）

---

## 🌟 God-Graph 的独特优势

### 1. 桶式邻接表 + Generation 索引

- **O(1) 增量更新**：优于静态 CSR 格式，适合动态图编辑场景
- **防止 ABA 问题**：删除节点后重用索引不会混淆（petgraph 没有的类型安全）
- **64 字节对齐**：避免 CPU 缓存 false sharing，推理性能基础

### 2. DifferentiableGraph（原创创新）

- **可微图结构**：将离散图结构转换为连续可微形式
- **梯度引导搜索**：用梯度下降自动发现最优神经网络架构
- **STE + Gumbel-Softmax**：支持离散 - 连续双向转换，梯度反向传播

### 3. GraphTransformer 显式注意力

- **每条边可单独访问/修改**：黑盒推理引擎（llama.cpp）做不到
- **动态拓扑编辑**：传统静态图（petgraph）做不到
- **可视化支持**：导出 DOT/Graphviz 格式，直观理解注意力模式

### 4. ModelSwitch 双向转换工作流

- **Safetensors ↔ GodGraph**：HuggingFace 格式双向转换
- **权重精度验证**：L2 范数比较，往返损失 < 1e-5
- **拓扑完整性检查**：自动检测孤立节点、梯度阻断
- **算子类型推断**：根据权重名称识别 Attention、MLP、Norm 等

### 5. 李群正交化 + 张量环压缩

- **数学保证**：李群理论保证权重矩阵正交性，数值稳定性
- **压缩比**：张量环分解减少 2-10x 参数量
- **端到端工作流**：Safetensors ↔ GodGraph ↔ Safetensors

---

## 🤝 贡献

欢迎贡献！请确保：
- 代码通过 `cargo clippy` 和 `cargo fmt`
- 添加适当的测试
- 更新文档

---

## 📄 许可证

双许可证：MIT 或 Apache-2.0（任选其一）

---

## 🙏 致谢

- [petgraph](https://github.com/petgraph/petgraph) - Rust 图算法库先驱
- [ndarray](https://crates.io/crates/ndarray) - N 维数组
- [wide](https://crates.io/crates/wide) - SIMD 数学库
- [HuggingFace](https://huggingface.co/) - Safetensors 格式

---

**联系方式**: silverenternal <3147264070@qq.com>
**项目地址**: https://github.com/silverenternal/god-graph

## Quick Start

### Installation

Add dependency to `Cargo.toml`:

```toml
[dependencies]
god-gragh = "0.4.2-beta"
```

### Basic Usage

```rust
use god_gragh::graph::Graph;
use god_gragh::graph::traits::{GraphOps, GraphQuery};

// Create a directed graph
let mut graph = Graph::<String, f64>::directed();

// Add nodes
let a = graph.add_node("A".to_string()).unwrap();
let b = graph.add_node("B".to_string()).unwrap();
let c = graph.add_node("C".to_string()).unwrap();

// Add edges
graph.add_edge(a, b, 1.0).unwrap();
graph.add_edge(b, c, 2.0).unwrap();
graph.add_edge(a, c, 3.0).unwrap();

// Query
println!("Nodes: {}", graph.node_count());
println!("Edges: {}", graph.edge_count());

// Iterate over neighbors
for neighbor in graph.neighbors(a) {
    println!("Neighbor: {}", graph[neighbor]);
}
```

### Using Graph Builder

```rust
use god_gragh::graph::builders::GraphBuilder;

let graph = GraphBuilder::directed()
    .with_nodes(vec!["A", "B", "C", "D"])
    .with_edges(vec![
        (0, 1, 1.0),
        (0, 2, 2.0),
        (1, 3, 3.0),
        (2, 3, 4.0),
    ])
    .build()
    .unwrap();
```

## Algorithms

### Traversal Algorithms

```rust
use god_gragh::algorithms::traversal::{dfs, bfs, topological_sort, tarjan_scc};

// Depth-First Search
dfs(&graph, start_node, |node| {
    println!("Visit: {}", node.data());
    true // Continue traversal
});

// Breadth-First Search
bfs(&graph, start_node, |node| {
    println!("Visit: {}", node.data());
    true
});

// Topological Sort (DAG)
let order = topological_sort(&graph);

// Tarjan's Strongly Connected Components
let sccs = tarjan_scc(&graph);
```

### Shortest Path Algorithms

```rust
use god_gragh::algorithms::shortest_path::{dijkstra, bellman_ford, floyd_warshall, astar};

// Dijkstra's Algorithm (non-negative weights)
let (path, distance) = dijkstra(&graph, start, Some(end)).unwrap();

// A* Search
let heuristic = |node: NodeIndex| -> f64 { /* Heuristic function */ 0.0 };
let (path, distance) = astar(&graph, start, end, heuristic).unwrap();

// Bellman-Ford (handles negative weights)
let distances = bellman_ford(&graph, start);

// Floyd-Warshall (all-pairs shortest paths)
let distances = floyd_warshall(&graph);
```

### Minimum Spanning Tree

```rust
use god_gragh::algorithms::mst::{kruskal, prim};

// Kruskal's Algorithm
let mst = kruskal(&graph);

// Prim's Algorithm
let mst = prim(&graph, start_node);
```

### Centrality Algorithms

```rust
use god_gragh::algorithms::centrality::{
    degree_centrality, betweenness_centrality, closeness_centrality, pagerank
};

// Degree Centrality
let centrality = degree_centrality(&graph);

// Betweenness Centrality
let centrality = betweenness_centrality(&graph);

// Closeness Centrality
let centrality = closeness_centrality(&graph);

// PageRank
let ranks = pagerank(&graph, 0.85, 20);
```

### Community Detection

```rust
use god_gragh::algorithms::community::{connected_components, label_propagation};

// Connected Components
let components = connected_components(&graph);

// Label Propagation Algorithm
let communities = label_propagation(&graph);
```

### Flow Algorithms

```rust
use god_gragh::algorithms::flow::{edmonds_karp, dinic, push_relabel};

// Edmonds-Karp Maximum Flow
let (flow, residual_graph) = edmonds_karp(&graph, source, sink);

// Dinic's Algorithm
let flow = dinic(&graph, source, sink);

// Push-Relabel Algorithm
let flow = push_relabel(&graph, source, sink);
```

## Parallel Algorithms

Enable `parallel` feature to use parallel algorithms:

```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["parallel"] }
```

```rust
use god_gragh::algorithms::parallel;

// Parallel BFS
let layers = parallel::bfs_parallel(&graph, start_node);

// Parallel PageRank
let ranks = parallel::pagerank_parallel(&graph, 0.85, 20);

// Parallel Connected Components
let components = parallel::connected_components_parallel(&graph);
```

### SIMD Optimization

Enable `simd` feature for SIMD vectorization (supports stable Rust):

```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["simd"] }
```

```rust
use god_gragh::algorithms::parallel;

// SIMD-accelerated PageRank
#[cfg(feature = "simd")]
let ranks = parallel::par_pagerank_simd(&graph, 0.85, 20);

// SIMD-accelerated Degree Centrality
#[cfg(feature = "simd")]
let centrality = parallel::par_degree_centrality_simd(&graph);
```

**Implementation Details**: Uses `wide::f64x4` type for 4-way parallel floating-point operations, automatically leveraging CPU SIMD instruction sets (SSE/AVX/AVX-512).

## Tensor & GNN Support

Enable tensor features for Graph Neural Network workflows:

```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["tensor", "tensor-gnn"] }
```

### Basic Tensor Operations

```rust
use god_gragh::tensor::{DenseTensor, TensorBase, TensorOps};

// Create tensors
let a = DenseTensor::new(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]);
let b = DenseTensor::new(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]);

// Matrix multiplication
let c = a.matmul(&b);

// Transpose
let t = a.transpose(None);

// Normalize
let norm = a.normalize();
```

### Graph-Tensor Conversion

```rust
use god_gragh::graph::Graph;
use god_gragh::tensor::GraphTensorExt;

// Create a graph with vector node features
let mut graph = Graph::<Vec<f64>, f64>::directed();

let n0 = graph.add_node(vec![1.0, 0.0]).unwrap();
let n1 = graph.add_node(vec![0.0, 1.0]).unwrap();
let n2 = graph.add_node(vec![1.0, 1.0]).unwrap();

let _ = graph.add_edge(n0, n1, 1.0);
let _ = graph.add_edge(n1, n2, 1.0);
let _ = graph.add_edge(n2, n0, 1.0);

// Convert to tensor representation
let (features, adjacency) = graph.to_tensor_representation().unwrap();

assert_eq!(features.shape(), &[3, 2]);
assert_eq!(adjacency.num_nodes, 3);
```

### GNN Layers

> **Important**: God-Graph GNN modules are **inference-only** (forward pass only).
> For training workflows, integrate with external autograd libraries:
> - **[dfdx](https://crates.io/crates/dfdx)**: Deep learning framework with CUDA support
> - **[Candle](https://github.com/huggingface/candle)**: HuggingFace's lightweight tensor library
> - **[tch-rs](https://crates.io/crates/tch-rs)**: Rust bindings for PyTorch

#### Inference Example (Recommended Use Case)

```rust
use god_gragh::tensor::gnn::{GCNConv, GATConv, GraphSAGE, MessagePassingLayer};

// Create GCN layer
let gcn = GCNConv::new(64, 64);

// Create GAT layer (multi-head attention)
let gat = GATConv::new(
    64,  // in_features
    64,  // out_features
    4,   // num_heads
);

// Create GraphSAGE layer
let graphsage = GraphSAGE::new(
    64,  // in_features
    32,  // out_features
    10,  // num_samples
);

// Forward pass (inference only)
let h1 = gcn.forward(&features, &adjacency);
let h2 = gat.forward(&h1, &edge_index);
let output = graphsage.forward(&h2, &edge_index);
```

#### Training Integration Example (with dfdx)

For complete GNN training, integrate with dfdx:

```rust
// Pseudo-code: Integrate god-gragh GNN with dfdx autograd
use dfdx::prelude::*;
use god_gragh::tensor::gnn::GCNConv;

// 1. Use god-gragh for graph structure and forward pass
let gcn = GCNConv::new(64, 64);
let output = gcn.forward(&features, &adjacency);

// 2. Convert to dfdx tensor for autograd
// let dfdx_tensor = Tensor1D::from(output.data());

// 3. Define loss and optimizer (dfdx)
// let loss = cross_entropy_loss(&dfdx_tensor, &labels);
// let mut optimizer = Adam::new(model.parameters(), lr=0.001);

// 4. Training loop
// for epoch in 0..num_epochs {
//     optimizer.zero_grad();
//     let loss = forward_pass(&graph, &labels);
//     optimizer.backward(&loss);
//     optimizer.step();
// }
```

**See**: [examples/differentiable_graph.rs](examples/differentiable_graph.rs) for an example of differentiable graph structures and gradient-based optimization.

### Memory Pool Optimization

```rust
use god_gragh::tensor::{TensorPool, PoolConfig};

// Create a tensor pool
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);

// Acquire tensor from pool (automatically zeroed)
let tensor = pool.acquire(vec![100, 100]);

// Automatically returned to pool when dropped
drop(tensor);
```

**Benefits**:
- **Memory Reuse**: Reduces allocation overhead in iterative algorithms (PageRank, GNN training) by **80-90%**
- **Automatic Recycling**: `PooledTensor` automatically returns to pool on Drop
- **Gradient Checkpointing**: `GradientCheckpoint` reduces memory usage during backpropagation by **40-60%**

### Memory Pool Benchmark Results

**Latest benchmarks** (run on Linux, Rust 1.85, March 2026):

| Benchmark | Time | Pool Hit Rate | Allocation Reduction |
|-----------|------|---------------|---------------------|
| Iterative (without pool) | 850.84 µs | N/A | Baseline |
| Iterative (with pool) | 127.76 µs | **98-100%** | **98-99.9%** |
| GNN Iteration | 31.93 µs | **96-99%** | **96-99%** |
| MatMul Temporaries | 42.15 µs | **95-98%** | **95-98%** |
| Small Tensors (16x16) | 6.89 µs | **98%+** | **98%+** |
| Large Tensors (512x512) | 17.36 µs | **95%+** | **95%+** |
| Sequential Alloc/Dealloc | 34.71 µs | **98%+** | **98%+** |
| Warm Pool (preallocated) | 34.39 µs | **100%** | **100%** |
| Cold Pool (no prealloc) | 35.32 µs | **98%+** | **98%+** |

**Key Findings**:

1. **Allocation Reduction**: The memory pool achieves **98-99.9% reduction** in new system allocations for iterative workloads, validating the "80-90% reduction" claim with actual measurements showing even better results.

2. **Performance Speedup**: For iterative allocation patterns (50 iterations of 128x128 tensors), the pool achieves **6.7x speedup** (850.84 µs → 127.76 µs).

3. **Preallocation Benefit**: Warm pools with preallocation achieve near-perfect hit rates (100%), eliminating allocation overhead entirely.

4. **Typical Workloads**:
   - **GNN Forward Pass**: 96-99% allocation reduction (hidden state temporaries)
   - **Attention QKV Projections**: 95-98% reduction (sequential matmul temporaries)
   - **Batch Processing**: 98%+ reduction with preallocated pools

**Pool Statistics from Benchmarks**:
```
=== Iterative Pool Stats ===
Total allocations: 204800
Pool hits (reuses): 204799
Pool misses (new allocs): 1
Hit rate: 100.00%
Allocation reduction: 100.00%

=== GNN Iteration Pool Stats ===
Total allocations: 300
Pool hits (reuses): 297
Pool misses (new allocs): 3
Hit rate: 99.00%
Allocation reduction: 99.00%

=== MatMul Temporaries Pool Stats ===
Total allocations: 60
Pool hits (reuses): 57
Pool misses (new allocs): 3
Hit rate: 95.00%
Allocation reduction: 95.00%
```

**Note**: The memory pool provides maximum benefit in:
1. **Iterative algorithms** (PageRank, GNN message passing) - 98-99.9% reduction
2. **Sequential temporaries** (QKV projections) - 95-98% reduction
3. **Batch processing** with preallocated pools - 100% hit rate achievable

Run memory pool benchmarks:
```bash
cargo bench --features "tensor tensor-pool" --bench memory_pool_reduction
```

## Transformer & LLM Inference

Enable the `transformer` feature for LLaMA/Mistral model inference:

```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer"] }
```

### Model Loading

```rust
use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};

// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
    "meta-llama/Llama-2-7b-hf",
    None, // token
).unwrap();

// Build the model
let model = LlamaModel::new(
    config,
    weights.embed_tokens,
    weights.layers,
    weights.norm,
    weights.lm_head,
).unwrap();

// Create text generator
let generator = TextGenerator::new(&model);
```

---

## 🔬 真实模型验证：TinyLlama-1.1B

God-Graph 已完整支持真实 LLM 模型的加载、验证和优化。本项目使用 **TinyLlama-1.1B** 进行端到端验证。

### 模型下载

```bash
# 使用 HuggingFace Hub 下载 TinyLlama-1.1B
pip install huggingface_hub
python scripts/download_tinyllama.py

# 模型将下载到 models/tinyllama/model.safetensors
```

### 加载真实模型

```rust
use god_gragh::transformer::optimization::{ModelSwitch, OperatorType};
use god_gragh::graph::traits::{GraphBase, GraphQuery};

// 从 safetensors 加载 TinyLlama
let graph = ModelSwitch::load_from_safetensors("models/tinyllama/model.safetensors")?;

// 验证模型结构
let node_count = graph.node_count();
let edge_count = graph.edge_count();

println!("TinyLlama-1.1B 加载成功:");
println!("  - 节点数：{}", node_count);
println!("  - 边数（权重）: {}", edge_count);

// 验证所有权重有效（无 NaN/Inf）
for edge_ref in graph.edges() {
    let weight = edge_ref.data();
    assert!(weight.data.iter().all(|&v| v.is_finite()), "权重包含非有限值");
}
println!("✓ 所有权重有效（无 NaN/Inf）");
```

### 李群正交化验证

对真实模型权重进行正交化处理，验证数值稳定性：

```rust
use god_gragh::transformer::optimization::lie_group::{
    orthogonalize_weights_in_place, LieGroupConfig
};

// 配置正交化参数
let config = LieGroupConfig::default()
    .with_cayley(true)      // 使用 Cayley 变换
    .with_block_size(32);   // SO(32) 块大小

// 执行原地正交化（零拷贝）
let errors = orthogonalize_weights_in_place(&config, &mut graph)?;

// 验证正交化效果
let avg_error = errors.iter().sum::<f64>() / errors.len() as f64;
println!("正交化结果:");
println!("  - 平均误差：{:.2e}", avg_error);
println!("  - 最大误差：{:.2e}", errors.iter().fold(0.0f64, f64::max));

// 验证：正交化误差应 < 1e-8
assert!(avg_error < 1e-8, "正交化误差过大");
println!("✓ 正交化成功（误差 < 1e-8）");
```

### 张量环压缩验证

对真实模型进行张量环压缩，验证压缩效果：

```rust
use god_gragh::transformer::optimization::{
    TensorRingCompressor, CompressionConfig
};

// 配置压缩参数
let config = CompressionConfig::default()
    .with_target_rank(16)    // 目标秩
    .with_min_rank(4);       // 最小秩

// 执行压缩
let compressor = TensorRingCompressor::new(&config);
let report = compressor.compress_graph(&graph)?;

println!("张量环压缩报告:");
println!("  - 原始参数量：{:.2}M", report.original_params / 1e6);
println!("  - 压缩后参数量：{:.2}M", report.compressed_params / 1e6);
println!("  - 压缩比：{:.2}x", report.compression_ratio);
println!("  - 重建误差：{:.2e}", report.reconstruction_error);

// 验证：压缩比应 < 0.5（至少 2 倍压缩）
assert!(report.compression_ratio < 0.5, "压缩比不达标");
println!("✓ 压缩成功（压缩比 < 0.5）");
```

### 测试命令

运行完整验证测试：

```bash
# 运行所有真实模型验证测试
cargo test --features "safetensors tensor" real_model -- --nocapture

# 运行正交化测试
cargo test --features "safetensors tensor" test_tinyllama_orthogonalization -- --nocapture

# 运行压缩测试
cargo test --features "safetensors tensor" test_tinyllama_tensor_ring -- --nocapture
```

### 验证结果

**测试文件**: `tests/real_model_validation.rs`

| 测试项 | 状态 | 说明 |
|--------|------|------|
| `test_load_tinyllama_model` | ✅ 通过 | 模型加载验证 |
| `test_tinyllama_orthogonalization` | ✅ 通过 | 正交化误差 < 1e-8 |
| `test_tinyllama_tensor_ring` | ✅ 通过 | 压缩比 < 0.5 |
| `test_tinyllama_weight_validity` | ✅ 通过 | 无 NaN/Inf |

**关键指标**:
- 正交化误差：**2.04e-14** (远低于 1e-8 阈值)
- 压缩比：**0.12x - 0.25x** (取决于秩选择)
- 重建误差：**< 1e-6** (数值精度保证)

详见 [tests/real_model_validation.rs](tests/real_model_validation.rs) 和 [CAD-LLM 1B 验证报告](CAD_LLM_1B_VALIDATION_REPORT.md)。

---

### Model Loading (English)

```rust
use god_gragh::transformer::{LlamaModel, LlamaConfig, TextGenerator, GenerationConfig};
use god_gragh::transformer::loader::{load_safetensors, load_from_hf_hub};

// Load from HuggingFace Hub
let (config, weights) = load_from_hf_hub(
    "meta-llama/Llama-2-7b-hf",
    None, // token
).unwrap();

// Build the model
let model = LlamaModel::new(
    config,
    weights.embed_tokens,
    weights.layers,
    weights.norm,
    weights.lm_head,
).unwrap();

// Create text generator
let generator = TextGenerator::new(&model);
```

### Text Generation

```rust
use god_gragh::transformer::{GenerationConfig, SamplingMode};

// Configure generation
let config = GenerationConfig::new()
    .with_max_length(512)
    .with_temperature(0.8)
    .with_top_p(0.9)
    .with_top_k(40)
    .with_repetition_penalty(1.1)
    .with_sampling_mode(SamplingMode::TopPTopK);

// Generate text
let prompt = "Once upon a time";
let result = generator.generate(prompt, &config).unwrap();
println!("Generated: {}", result.text);
println!("Tokens: {} in {}ms", result.num_tokens, result.generate_time_ms);
```

### KV Cache Optimization

```rust
use god_gragh::transformer::kv_cache::{CacheConfig, KvCache};

// Configure KV cache
let config = CacheConfig::new()
    .with_max_batch_size(1)
    .with_max_seq_len(2048)
    .with_dtype("f32");

// Cache is managed internally during generation
// Supports incremental decoding and multi-turn dialogue
```

### Batch Inference with SIMD

Enable `simd` feature for SIMD-accelerated batch inference:

```toml
[dependencies]
god-gragh = { version = "0.4.2-beta", features = ["transformer", "simd"] }
```

```rust
use god_gragh::transformer::batch::{BatchGenerator, BatchRequest};

// Create batch generator
let mut batch_gen = BatchGenerator::new(&model);

// Add multiple requests
batch_gen.add_request("Hello, how are you?", 1);
batch_gen.add_request("What is Rust?", 2);
batch_gen.add_request("Explain quantum computing", 3);

// Process batch with SIMD acceleration
let results = batch_gen.generate_batch(&config).unwrap();
```

### Supported Models

- **LLaMA / Llama-2 / Llama-3**: Meta's open language models
- **Mistral**: Mistral AI's efficient models
- **Gemma**: Google's lightweight models
- **Qwen**: Alibaba's multilingual models

**Note**: Model weights must be in `.safetensors` format. Use `load_safetensors` or `load_from_hf_hub` for automatic conversion.

### Examples

See the following examples for complete workflows:

- [`examples/llm_model_loader.rs`](examples/llm_model_loader.rs) - Loading models from HuggingFace
- [`examples/llm_text_gen.rs`](examples/llm_text_gen.rs) - End-to-end text generation
- [`examples/llm_batch_simd.rs`](examples/llm_batch_simd.rs) - SIMD-accelerated batch inference

Run examples:
```bash
# Text generation demo
cargo run --example llm_text_gen --features transformer

# Batch inference with SIMD
cargo run --example llm_batch_simd --features "transformer,simd"
```

### Documentation

- [Transformer Module Guide](docs/transformer_guide.md) - API reference and usage
- [Transformer Tutorial](docs/transformer_tutorial.md) - Step-by-step tutorial
- [Enhancements Report](docs/TRANSFORMER_ENHANCEMENTS_REPORT.md) - Implementation details

## Random Graph Generation

```rust
use god_gragh::generators::{
    erdos_renyi_graph, barabasi_albert_graph, watts_strogatz_graph,
    complete_graph, grid_graph, tree_graph
};

// Erdős-Rényi Random Graph G(n, p)
let graph = erdos_renyi_graph::<String>(100, 0.1, true, 42);

// Barabási-Albert Preferential Attachment Model
let graph = barabasi_albert_graph::<String>(100, 3);

// Watts-Strogatz Small-World Network
let graph = watts_strogatz_graph::<String>(100, 4, 0.1);

// Complete Graph
let graph = complete_graph::<String, f64>(10);

// Grid Graph
let graph = grid_graph::<String, f64>(5, 5);

// Tree
let graph = tree_graph::<String, f64>(3, 100);
```

## Graph Export

### DOT/Graphviz Format

```rust
use god_gragh::export::{to_dot, to_svg, to_adjacency_list, to_edge_list};

// Export to DOT format (Graphviz)
let dot = to_dot(&graph);
std::fs::write("graph.dot", dot)?;

// Generate visualization:
// bash: dot -Tpng graph.dot -o graph.png
```

### SVG Visualization

```rust
use god_gragh::export::svg::{SvgOptions, LayoutAlgorithm};

// Export to SVG format with custom options
let options = SvgOptions::new()
    .with_size(800, 600)
    .with_node_radius(25.0)
    .with_layout(LayoutAlgorithm::ForceDirected);
let svg = to_svg(&graph, &options);
std::fs::write("graph.svg", svg)?;

// View in browser using examples/graph_viewer.html
```

**Layout Algorithms**:
- **Force-Directed**: Physics-based layout with node repulsion and edge attraction
- **Circular**: Nodes arranged in a circle
- **Hierarchical**: Layered layout based on topological sort

**Interactive Viewer**: Open `examples/graph_viewer.html` in browser to:
- Drag and drop SVG files
- Zoom and pan
- Adjust node/edge styles in real-time
- View node list

### Adjacency List & Edge List

```rust
// Export as adjacency list
let adj_list = to_adjacency_list(&graph);

// Export as edge list
let edge_list = to_edge_list(&graph);
```

## Feature Flags

### Basic Features

| Feature | Description | Dependencies |
|---------|-------------|--------------|
| `std` | Standard library support (enabled by default) | - |
| `parallel` | Parallel algorithms | rayon, crossbeam-queue |
| `serde` | Serialization support | serde |
| `dot` | DOT format export | - |
| `simd` | SIMD vectorization (experimental, stable Rust) | wide |
| `matrix` | Matrix representation | nalgebra |
| `rand` | Random graph generation | rand, rand_chacha |
| `unstable` | Nightly Rust features | - |

### Tensor Features

| Feature | Description | Dependencies |
|---------|-------------|--------------|
| `tensor` | Tensor core support (ndarray backend) | ndarray |
| `tensor-sparse` | Sparse tensor formats (COO, CSR, BSR) | tensor |
| `tensor-gpu` | GPU acceleration (requires CUDA) | tensor, dfdx |
| `tensor-candle` | Candle backend (Hugging Face) | tensor, candle-core |
| `tensor-autograd` | Automatic differentiation | tensor, dfdx |
| `tensor-serde` | Tensor serialization | tensor, serde |
| `tensor-gnn` | GNN layers (GCN, GAT, GraphSAGE) | tensor, tensor-sparse, rand_distr |
| `tensor-pool` | Memory pool optimization | tensor, bitvec |
| `tensor-batch` | Batch graph processing | tensor, tensor-sparse |

### Meta-Features (Recommended)

| Meta-Feature | Description | Included Features |
|--------------|-------------|-------------------|
| `tensor-full` | All tensor features | tensor, tensor-sparse, tensor-gnn, tensor-pool, tensor-batch |
| `tensor-inference` | GNN inference only | tensor, tensor-sparse, tensor-gnn |
| `tensor-ml` | ML training support | tensor, tensor-sparse, tensor-gnn, tensor-autograd, tensor-pool |

### Transformer Features

| Feature | Description | Dependencies |
|---------|-------------|--------------|
| `transformer` | Transformer/LLM inference | serde_json, memmap2, regex |
| `safetensors` | Safetensors model loading | safetensors |
| `hf-hub` | HuggingFace Hub integration | hf-hub, tokio |
| `simd` | SIMD acceleration for batch inference | wide |

**Note**: For complete Transformer support, use `--features "transformer,safetensors,simd"`. See [Transformer & LLM Inference](#transformer--llm-inference) for details.

## Comparison with petgraph

| Feature | God-Graph | petgraph |
|---------|-----------|----------|
| Memory Layout | Bucket-based adjacency list + Arena-style slots | Adjacency list |
| Incremental Updates | ✅ O(1) | ❌ Requires rebuild |
| Stable Indices | ✅ Generation counting | ✅ Stable Graph |
| Parallel Algorithms | ✅ Built-in (5+) | ❌ |
| Cache Optimization | ✅ 64-byte alignment | ❌ |
| SIMD Vectorization | ✅ wide::f64x4 | ❌ |
| Tensor/GNN Support | ✅ Multi-backend | ❌ |
| **Transformer/LLM** | ✅ LLaMA/Mistral inference | ❌ |
| API Design | Generic traits | Concrete types |
| Documentation | 🌱 Growing | 🌳 Mature |
| Community Maturity | 🌱 Growing | 🌳 Mature |

**God-Graph Advantages**:
1. Generation-indexed stability prevents ABA problems
2. Bucket-based adjacency list supports O(1) incremental updates
3. Built-in parallel algorithm suite with proven speedups
4. Cache-optimized memory layout (64-byte alignment, software prefetching)
5. SIMD vectorization for batch computations
6. Integrated tensor/GNN support for machine learning workflows

**petgraph Advantages**:
1. Mature community, production-proven
2. Comprehensive documentation
3. More algorithm variants

## Performance Benchmarks

Detailed performance data available in [**Performance Report**](docs/performance.md).

Benchmark results on 8-core CPU:

| Algorithm | Scale | Serial Time | Parallel Time | Speedup |
|-----------|-------|-------------|---------------|---------|
| PageRank | 1,000 nodes | 53.9ms | 668µs | **80.7x** |
| DFS | 50K nodes | 9.7ms | 1.3ms | **7.5x** |
| Connected Components | 2,000 nodes | - | 357.8µs | - |
| Degree Centrality | 5,000 nodes | - | 146µs | - |

### SIMD Performance (Estimated)

| Graph Scale | Serial | Parallel | SIMD | Speedup |
|-------------|--------|----------|------|---------|
| 100 nodes | 2.1ms | 280µs | ~150µs | 14x |
| 1,000 nodes | 210ms | 2.8ms | ~1.5ms | 140x |
| 5,000 nodes | 5.2s | 68ms | ~40ms | 130x |

*Note: SIMD performance depends on CPU instruction set support (AVX2/AVX-512)*

### Memory Pool Performance

The tensor memory pool reduces allocation overhead by reusing pre-allocated memory, achieving high reuse ratios for iterative algorithms.

#### Benchmark Results (Actual Measurements)

| Benchmark | Without Pool | With Pool | Improvement |
|-----------|--------------|-----------|-------------|
| Iterative allocation (50× 128×128) | 847.91 µs | 2.57 µs/iter | **~99.7% faster per iteration** |
| GNN iteration (10 steps) | N/A | 10.85 µs | **Hit rate: 99.89%** |
| Matrix multiplication temporaries | N/A | 4.14 µs | **Hit rate: 99.93%** |
| Small tensor allocation (16×16) | N/A | 694.86 ns | **Hit rate: 99.95%** |
| Large tensor allocation (512×512) | N/A | 48.12 µs | **Hit rate: 99.93%** |
| Sequential alloc/dealloc (50×) | N/A | 35.22 µs | **Hit rate: 100.00%** |

#### Pool Hit Rate by Workload

| Workload | Warm-up Hit Rate | Steady-state Hit Rate |
|----------|------------------|----------------------|
| Iterative (single tensor) | 0% → 99.90% (16 steps) | **99.99%** |
| GNN iteration | 90% → 99.99% (10 steps) | **99.89%** |
| Matrix multiplication | 98.33% → 100% | **99.93%** |
| Batch size 10 | 90% → 100% | **99.99%** |
| Batch size 25 | 96% → 100% | **99.99%** |
| Batch size 50 | 98% → 100% | **99.99%** |
| Batch size 100 | 99% → 100% | **99.99%** |

#### Pre-allocation Impact (Warm vs Cold Pool)

| Configuration | Initial Hit Rate | Steady-state Hit Rate | Latency |
|---------------|------------------|----------------------|---------|
| Cold pool (no pre-alloc) | 0% | 100% | 696.89 ns |
| Warm pool (pre-allocated) | 0% → 100% (faster) | 100% | 696.83 ns |

**Key Metrics**:
- **Reuse Ratio**: >99% for iterative workloads (measured via `pool.hit_rate()`)
- **Allocation Reduction**: 80-90% fewer system allocations after warm-up
- **Memory Throughput**: Pre-allocation eliminates runtime allocation latency
- **Warm-up Time**: ~16 iterations to reach 99.9%+ hit rate
- **Steady-state Latency**: ~695 ns per tensor acquire/release

**Note**: The memory pool shows higher absolute time in micro-benchmarks due to pool management overhead, but provides significant benefits in real-world iterative algorithms by eliminating repeated system allocations and improving cache locality.

**Usage Example**:
```rust
use god_gragh::tensor::pool::{TensorPool, PoolConfig};

// Create pool with pre-allocation
let config = PoolConfig::new(16, 128).with_preallocate(true);
let mut pool = TensorPool::new(config);

// Acquire tensors (reuses memory after first allocation)
for _ in 0..50 {
    let tensor = pool.acquire(vec![128, 128]);
    // ... use tensor ...
    drop(tensor); // Automatically returns to pool
}

// Check statistics
let stats = pool.stats();
println!("Hit rate: {:.2}%", stats.hit_rate() * 100.0);
println!("New allocations: {}", stats.pool_misses);
```

Run benchmarks:
```bash
cargo bench --features tensor,tensor-pool --bench tensor_pool
```

## Test Coverage

This project uses `cargo-tarpaulin` for coverage measurement, targeting **80%+** coverage.

### Generate Coverage Report

```bash
# Install cargo-tarpaulin
cargo install cargo-tarpaulin

# Generate HTML coverage report
cargo tarpaulin --all-features --out Html --output-dir coverage

# View report
open coverage/tarpaulin-report.html  # macOS
xdg-open coverage/tarpaulin-report.html  # Linux
```

### Current Coverage

- **Overall Coverage**: 66.64% (1560/2341 lines)
- **Unit Tests**: 82 passed
- **Integration Tests**: 18 passed
- **Property Tests**: 15 passed
- **Doc Tests**: 27 passed (1 ignored)
- **Total**: 142 tests, 100% passing

See [coverage/tarpaulin-report.html](coverage/tarpaulin-report.html) for details.

## Development Roadmap

See [ROADMAP.json](ROADMAP.json) for detailed roadmap.

### Version History

- [x] v0.1.0-alpha: Core graph structure, basic CRUD, DFS/BFS
- [x] v0.2.0-alpha: Complete algorithm suite, random graph generators
- [x] v0.3.0-beta: Performance reports, migration guide, parallel algorithms
- [x] **v0.4.0-beta**: Tensor/GNN integration, memory pool optimization, differentiable graph
- [x] **v0.4.2-beta**: **Transformer/LLM inference**, LLaMA/Mistral support, KV Cache optimization, text generation
- [ ] v0.5.0-rc: Serde support, API stabilization
- [ ] v1.0.0-stable: Production-ready

### Upcoming Features

- [ ] Improve test coverage to 80%+
- [ ] GitHub Pages documentation site
- [ ] crates.io release
- [ ] Graph-Tensor deep integration (Phase 4)
- [ ] Automatic differentiation support (Phase 5)
- [ ] GPU acceleration with Dfdx/Candle backends (Phase 6)
- [ ] Multi-modal models (Llava, etc.)

## Contributing

Contributions are welcome! Please follow these steps:

1. Fork the project
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

Please ensure:
- Code passes `cargo clippy` and `cargo fmt`
- Add appropriate tests
- Update documentation

## Known Issues

1. **Coverage Gap**: Current 66.64%, below 80% target
   - Main gaps: Community detection, flow algorithms, matching algorithms
   - Plan: Add targeted tests in v0.4.0

2. **Force-Directed Layout**: Current implementation is simplified
   - 50 iterations, fixed parameters
   - Plan: Configurable iterations and physics parameters in v0.4.0

3. **par_dijkstra**: Marked as experimental in v0.3.0-beta
   - Known issues with bucket index calculation and potential deadlocks
   - Plan: Refactor in v0.4.0

## License

This project is dual-licensed: MIT or Apache-2.0 (at your option).

See [LICENSE-MIT](LICENSE-MIT) and [LICENSE-APACHE](LICENSE-APACHE) for details.

## Acknowledgments

- [petgraph](https://github.com/petgraph/petgraph) - Pioneer of Rust graph libraries
- [rayon](https://github.com/rayon-rs/rayon) - Data parallelism library
- [Graphviz](https://graphviz.org/) - Graph visualization tool
- [wide](https://crates.io/crates/wide) - SIMD math library for stable Rust
- [ndarray](https://crates.io/crates/ndarray) - N-dimensional arrays
- [dfdx](https://crates.io/crates/dfdx) - Deep learning framework with CUDA support
- [Candle](https://github.com/huggingface/candle) - HuggingFace's lightweight tensor library
- [Hugging Face](https://huggingface.co) - Open-source AI community and model hub
- [Safetensors](https://github.com/huggingface/safetensors) - Safe tensor serialization format
- [LLaMA](https://ai.meta.com/llama/) - Meta's open language models
- [Mistral](https://mistral.ai) - Mistral AI's efficient language models

## Contact

- Issue Reports: [GitHub Issues](https://github.com/silverenternal/god-graph/issues)
- Discussions: [GitHub Discussions](https://github.com/silverenternal/god-graph/discussions)
- Documentation: [docs.rs/god-gragh](https://docs.rs/god-gragh)