attnres 0.1.1

Attention Residuals (MoonshotAI/Kimi) implementation in Rust using burn
Documentation
# AGENTS.md — AI Agent Technical Context

## Project Overview

**attnres** is the first Rust implementation of Attention Residuals (MoonshotAI/Kimi paper) using the [burn](https://github.com/tracel-ai/burn) deep learning framework. It provides a drop-in replacement for standard residual connections in Transformers.

## Tech Stack

| Component   | Technology       | Version  |
|-------------|-----------------|----------|
| Language    | Rust            | 2021 edition (1.80+) |
| ML Framework| burn            | 0.20     |
| Test Backend| NdArray         | (CPU, deterministic) |
| Testing     | cargo test + proptest + criterion ||
| Linting     | clippy + rustfmt ||
| CI          | GitHub Actions   | test, clippy, fmt, build-examples |

## Project Structure

```
src/
├── lib.rs              # Public API re-exports + module declarations
├── config.rs           # AttnResConfig — validated builder pattern (JSON save/load)
├── attn_res_op.rs      # Core AttnRes operation (depth-wise softmax attention)
├── block_state.rs      # BlockState — cumulative block representation tracking
├── layer.rs            # AttnResLayer — transformer layer with dual AttnRes
├── model.rs            # AttnResTransformer — full model with standard + two-phase forward
├── rms_norm.rs         # RMSNorm implementation
├── serialization.rs    # Model weight save/load (NamedMpk, binary, compact formats)
├── two_phase.rs        # Two-phase inference primitives (phase1_batched, online_softmax_merge)
├── attention.rs        # Multi-head self-attention
├── feed_forward.rs     # Two-layer MLP with GELU activation
└── utils.rs            # Causal mask generation helpers

tests/
├── unit_tests.rs       # Core algorithm correctness tests
├── differential_tests.rs # PyTorch reference comparison tests
├── property_tests.rs   # proptest property-based tests
└── integration_tests.rs # Full model training loop tests

examples/
├── train_tiny.rs       # Train a small model on synthetic data
├── compare_residuals.rs # Compare AttnRes vs standard residuals
└── visualize_weights.rs # Visualize depth attention patterns

benches/
└── attn_res_benchmark.rs # Criterion benchmarks

fixtures/                # Reference outputs from PyTorch
├── attn_res_forward.json
└── block_state_tracking.json

web-demo/                # Interactive web demo (WASM + Vite)
├── crate/               # Rust WASM crate (pure-Rust AttnRes reimplementation)
│   ├── Cargo.toml
│   └── src/lib.rs       # wasm-bindgen exports: AttnResEngine
├── src/                 # TypeScript frontend
│   ├── main.ts          # App entry point
│   ├── style.css        # Academic-grade styling
│   ├── viz.ts           # Canvas 2D heatmaps, charts
│   └── diagrams.ts      # Static architectural diagrams
├── index.html           # Single-page app
├── package.json         # Vite + TypeScript
└── vite.config.ts       # Build config
```

## Commands

```bash
cargo build                        # Build the project
cargo test --all-features          # Run all 87 tests
cargo test test_name               # Run specific test
cargo clippy -- -D warnings        # Lint (warnings = errors)
cargo fmt                          # Format code
cargo fmt -- --check               # Check formatting without modifying
cargo bench                        # Run Criterion benchmarks
cargo run --example train_tiny     # Train example
cargo run --example compare_residuals  # Comparison example
cargo run --example visualize_weights  # Visualization example

# Web demo
cd web-demo && npm run build:wasm     # Build WASM crate
cd web-demo && npm run dev            # Start Vite dev server
cd web-demo && npm run build          # Production build (WASM + Vite)
```

## Architecture Essentials

### Core Algorithm (AttnRes)

Standard residual: `x_{l+1} = x_l + f_l(x_l)` (fixed unit weights)

AttnRes: `x_{l+1} = Σ α_i · v_i` where α = softmax(w_l · RMSNorm(V)) over depth dimension

Key invariants:
1. **Zero-init pseudo-queries** → starts as uniform averaging (standard residual behavior)
2. **Two AttnRes per transformer layer** — one before self-attention, one before MLP
3. **Softmax over depth** (block/layer dimension), NOT over sequence tokens
4. **RMSNorm on keys** to prevent magnitude domination
5. **Block boundaries** at every `block_size/2` sublayers

### Data Flow

```
Input IDs → Embedding → [AttnResLayer × N] → RMSNorm → LM Head → Logits
                    AttnResOp(pre-attn) → RMSNorm → MultiHeadAttention
                    AttnResOp(pre-mlp)  → RMSNorm → FeedForward
```

### Configuration

`AttnResConfig::new(d_model, num_layers, num_blocks)` where:
- `d_model`: Hidden dimension
- `num_layers`: Number of **sublayers** (transformer layers × 2)
- `num_blocks`: Number of blocks for Block AttnRes (set = num_layers for Full AttnRes)

## Boundaries

### Read-Only (never modify)
- `spec.md`, `paper.md`, `research_report.md`, `implementation_plan.md`, `LICENSE`

### Gated (requires approval)
- `Cargo.toml` (dependency changes)
- `.github/workflows/` (CI changes)
- `cargo publish`

## Source of Truth

`spec.md` is the authoritative specification. All algorithm implementations must match the pseudocode and equations defined there.

## Web Demo

The `web-demo/` directory contains a fully interactive browser-based demo. The WASM crate (`web-demo/crate/`) is a pure-Rust reimplementation of the core AttnRes algorithm (no burn dependency for WASM portability), faithfully mirroring `src/attn_res_op.rs`. It exposes:

- `AttnResEngine` — model creation, forward pass, training simulation
- `compute_attn_res()` — interactive core operation with custom pseudo-queries
- `train_step()` — simulated training showing depth attention pattern emergence

Frontend: Vite + TypeScript with Canvas 2D visualizations (heatmaps, bar charts, loss curves). Academic design with full algorithm explanation.

## Known Gaps

- No PyTorch checkpoint loading (safetensors format)
- GPU backends (wgpu, CUDA, Metal) untested
- No distributed training support
- Pre-trained weight import/export utilities