# deep-delta-learning
[](https://github.com/AbdelStark/ddl-rs/actions/workflows/ci.yml)
[](LICENSE)
[](https://www.rust-lang.org)
A Rust implementation of **Deep Delta Learning** (DDL) on [Burn](https://burn.dev) — rank-deficient residual operators with learnable spectral properties for transformer language models.
```
x' = x + k * beta * (v - k^T x)
\_________________________/
correction toward target v
scaled by projection direction k
and learnable strength beta in (0, 2)
```
DDL replaces the standard residual connection `x' = x + f(x)` with a geometrically-motivated update that projects the hidden state toward a learned target along a learned direction. The scalar `beta` controls the spectral regime of each layer — from near-identity (beta -> 0) through projection (beta -> 1) to reflection (beta -> 2) — giving the model a continuous knob between "change nothing" and "invert the residual stream."
## Status
As of 2026-03-17, `deep-delta-learning` is **alpha**. It is suitable for CPU-based research
prototypes, CLI experiments, checkpoint roundtrips, and benchmark/report generation. It is **not**
yet suitable for production inference services, large-scale training, or a semver-stable public
API. Known limitations include CPU/ndarray-only execution, in-memory toy dataset workflows, and
expected API churn before 1.0. No specific breaking change is scheduled next, but pre-1.0 API
cleanup should be expected.
See [CHANGELOG.md](CHANGELOG.md) for user-visible changes and
[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) for the current system/runbook view.
## Installation
```bash
git clone https://github.com/AbdelStark/ddl-rs.git
cd ddl-rs
cargo build --release
```
Requires Rust 2024 edition. CPU/ndarray backend only (no GPU required).
## 30-Second Demo
```bash
# Train all 6 variants, compare, and benchmark — one command
cargo run --release --bin ddl -- train \
--train-tokens "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16" \
--valid-tokens "5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4" \
--all-variants --out /tmp/ddl-demo --seq-len 8 --max-steps 32
# Interactive TUI explorer (generates demo data if run with no args)
cargo run --release --features tui --bin ddl-tui
```
## The Delta Operator
The core primitive is a **rank-1 spectral update**. Given hidden state `x in R^d`, direction `k in R^d` (unit-normalized), target scalar `v in R`, and mixing coefficient `beta in (0, 2)`:
```
Vector form (d_value = 1)
─────────────────────────
proj = k^T x
x' = x + beta * k * (v - proj)
Matrix form (d_value > 1)
─────────────────────────
P = k x^T [d, d_value]
x' = x + beta * k * (V - P)
```
**Spectral properties** of the induced linear map `A = I - beta * k k^T`:
| k-eigenvalue | `1 - beta` | Scaling along projection direction |
| Orthogonal eigenvalues | `1` | Unchanged in the nullspace of k |
| Determinant | `1 - beta` | Volume scaling (vector state) |
| Lifted determinant | `(1 - beta)^d_value` | Volume scaling (matrix state) |
The eigenvalue `1 - beta` defines five **spectral regimes**:
```
beta: 0.0 0.3 0.7 1.3 1.7 2.0
|------|--------|--------|--------|--------|
Near Inter- Near Near Strong
Identity polating Projection Reflection Reflection
"keep x" "project" "invert"
```
## Architecture
A DDL transformer block replaces the standard `x + Attn(x)` and `x + MLP(x)` with:
```
┌─────────────────────┐
│ Input x │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ RMSNorm + MHA │──── backbone_out
└──────────┬──────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ k-branch │ │ v-branch │ │ beta-branch │
│ k = f(out) │ │ v = g(out) │ │ beta = h(x) │
│ ||k|| = 1 │ │ │ │ 0 < beta < 2│
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└─────────┬───────┘ │
│ │
┌───────▼──────────────────────────▼───┐
│ x' = x + beta * k * (v - k^T x) │
│ delta residual update │
└──────────────┬───────────────────────┘
│
┌──────────▼──────────┐
│ RMSNorm + SwiGLU │──── repeat for MLP sublayer
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Output x' │
└─────────────────────┘
```
Each transformer block applies the delta residual update **twice** — once after attention, once after the MLP — giving `2 * num_layers` total beta/k/v branches.
## Model Variants
Six variants explore the design space along three axes: **state rank** (scalar vs matrix), **compression** (token-conv vs channel-conv), and **embedding expansion**.
| Baseline | `baseline` | — | — | — | Standard transformer (no DDL) |
| DDL Vector | `ddl` | 1 | TokenConv | No | Rank-1 delta operator. Cheapest DDL variant. |
| DDL Matrix | `ddl-tokenconv` | d_value | TokenConv | No | Matrix-state delta with token compression. |
| DDL Matrix+EC | `ddl-ec` | d_value | TokenConv | Yes | + embedding expansion convolution. |
| DDL Channel | `ddl-cc` | d_value | ChannelConv | No | Matrix-state with channel compression. |
| DDL Channel+EC | `ddl-cc-ec` | d_value | ChannelConv | Yes | Full variant: channel compression + embed conv. |
**Baseline** uses a standard residual `x + f(x)`. All DDL variants replace it with the delta update. The matrix variants (`d_value > 1`) maintain a `[batch, seq, d_model, d_value]` state tensor, giving each position a richer internal representation at the cost of `O(d_value)` more computation per layer.
## CLI Reference
### train
Train one or more variants and save checkpoints + reports.
```bash
ddl train \
--train-file data/train.txt --valid-file data/valid.txt \
--out artifacts/ --all-variants \
--d-model 128 --layers 6 --heads 8 --vocab-size 32000 --d-value 4 \
--max-steps 1000 --eval-interval 100 --learning-rate 3e-4 \
--warmup-steps 50 --weight-decay 0.1 --grad-clip 1.0
```
Outputs per-variant checkpoint directories and `training_comparison.json` with loss curves, spectral snapshots at each eval interval, and variant rankings.
`--resume <dir>` continues training from a saved single-variant checkpoint.
### compare
Evaluate saved checkpoints on a dataset.
```bash
ddl compare \
--artifact artifacts/ddl --artifact artifacts/baseline \
--data-file data/test.txt --out comparison.json \
--diagnostics spectral --best-validation
```
### benchmark
Measure operator, block, and full-model latency/memory.
```bash
ddl benchmark --all-variants --iterations 50 --warmup 10 --out bench.json
# Compare against a saved baseline with regression gates
ddl benchmark --all-variants --out current.json \
--baseline saved_baseline.json --comparison-out regression.json \
--min-throughput-ratio 0.9 --max-p95-latency-ratio 1.2
```
Suites: `operator`, `block`, `compressor`, `normalization`, `model` (default: all).
### generate
Autoregressive token generation from a checkpoint.
```bash
ddl generate \
--checkpoint artifacts/ddl --prompt-tokens "1,2,3,4" \
--max-new-tokens 64 --do-sample --temperature 0.8 --top-k 50
```
## TUI Explorer
An interactive terminal UI for visualizing training and benchmark results.
```bash
# Auto-generates demo data — zero config needed
cargo run --release --features tui --bin ddl-tui
# Or load your own reports
cargo run --release --features tui --bin ddl-tui -- \
artifacts/training_comparison.json bench.json
```
<table>
<tr>
<td align="center"><strong>Dashboard</strong> — Variant Comparison<br><img src="docs/assets/img/tui-dashboard.png" width="420" alt="Dashboard view showing variant comparison table with loss deltas and model/training config panels"></td>
<td align="center"><strong>Training</strong> — Loss Curves & Metrics<br><img src="docs/assets/img/tui-training.png" width="420" alt="Training view showing loss chart with train/validation curves, LR schedule, and step metrics table"></td>
</tr>
<tr>
<td align="center"><strong>Spectral</strong> — Beta & Eigenvalue Diagnostics<br><img src="docs/assets/img/tui-spectral.png" width="420" alt="Spectral view showing beta per layer bars with regime colors, regime distribution, k-eigenvalue and k-coherence"></td>
<td align="center"><strong>Benchmark</strong> — Throughput & Memory<br><img src="docs/assets/img/tui-benchmark.png" width="420" alt="Benchmark view showing model throughput bars, component timing table, and memory usage"></td>
</tr>
</table>
Navigate with `1-4` / `Tab` to switch views, `j/k` to select variants, `?` for help, `q` to quit.
## Architecture & Operations
The short version:
- `ddl` is the shipped automation surface for train/compare/benchmark/generate.
- checkpoints are explicit directories with manifest, weights, report, optimizer state, and
optional `best_validation/` state
- invalid evaluation inputs are expected to fail with typed/library or CLI errors rather than
degrade into placeholder metrics
For the full component map, invariants, validation commands, and failure-mode runbook, see
[docs/ARCHITECTURE.md](docs/ARCHITECTURE.md).
## Configuration Reference
### DdlConfig
| `d_model` | *required* | Hidden dimension. Must be divisible by `num_heads`. |
| `num_layers` | *required* | Number of transformer blocks. |
| `num_heads` | *required* | Number of attention heads. |
| `vocab_size` | *required* | Token vocabulary size. |
| `d_value` | 1 | Delta state rank. 1 = vector (rank-1). >1 = matrix state. |
| `head_dim` | d_model/heads | Per-head dimension. |
| `mlp_hidden` | ceil(8d/3) | SwiGLU intermediate dimension. |
| `mapping` | KMap | Branch mapping: `KMap` or `VMap`. |
| `beta_init` | 0.5 | Initial beta value. Must be in (0, 2). |
| `beta_single_linear` | true | Single-linear beta gate. false = two-layer. |
| `k_eps` | 1e-6 | k-direction normalization epsilon. |
| `compression` | TokenConv | `TokenConv` or `ChannelConv`. |
| `shortconv_kernel_size` | 4 | Causal depthwise conv kernel size. |
| `embed_conv` | false | Enable embedding expansion convolution. |
| `max_seq_len` | 128 | Maximum sequence length (RoPE). |
| `rope_theta` | 10000.0 | RoPE frequency base. |
### TrainingConfig
| `max_steps` | 32 | Total training steps per phase. |
| `eval_interval` | 4 | Steps between validation + spectral snapshots. |
| `learning_rate` | 1e-3 | Peak learning rate. |
| `min_learning_rate` | 0.0 | Cosine decay floor. |
| `warmup_steps` | 0 | Linear warmup steps. |
| `weight_decay` | 0.1 | AdamW weight decay. |
| `beta1` / `beta2` | 0.9 / 0.95 | AdamW momentum coefficients. |
| `epsilon` | 1e-5 | AdamW epsilon. |
| `grad_clip` | 1.0 | Gradient norm clipping threshold. |
## Library Usage
```rust
use burn::backend::{Autodiff, NdArray};
use deep_delta_learning::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
type B = Autodiff<NdArray>;
let device = Default::default();
// Configure and train
let config = DdlConfig::new(64, 2, 4, 256);
let training = TrainingConfig::new().with_max_steps(100);
let batcher = TokenBatcher::try_new(TokenBatchingConfig::try_new(4, 16)?)?;
let dataset = batcher.batch_tokens(&tokens);
let val_dataset = dataset.clone();
let outcome = train_variant::<B>(
&config, ModelVariant::DdlVector, &device,
&dataset, Some(&val_dataset), &training,
)?;
// Inspect results
println!("Final loss: {:.4}", outcome.report.final_train.loss);
println!("Parameters: {}", outcome.report.num_params);
// Spectral diagnostics
if let Some(spectral) = &outcome.report.final_train_spectral {
for (i, regime) in spectral.regime_per_layer.iter().enumerate() {
println!("Layer {i}: beta={:.3} regime={:?}",
spectral.beta_per_layer[i], regime);
}
}
// Save and reload
save_training_artifact(&outcome, std::path::PathBuf::from("checkpoint/"))?;
Ok(())
}
```
## Project Structure
```
src/
delta_operator.rs Core rank-1 spectral operator: x' = x - beta * k(k^T x)
delta_res_block.rs Residual update: x' = x + beta * k * (v - k^T x)
beta_branch.rs Learnable beta in (0, 2) via sigmoid scaling
k_branch.rs Unit-normalized projection direction
v_branch.rs Scalar and matrix target value branches
transformer.rs DDL transformer block and stack
baseline.rs Standard transformer (no DDL) for comparison
variant.rs 6 model variants and config resolution
spectral.rs Regime classification, per-layer diagnostics, collectors
training.rs Training loop with cosine-warmup LR, spectral snapshots
checkpoint.rs Checkpoint save/load with manifest and optimizer state
eval.rs Variant evaluation and comparison ranking
benchmark.rs Timing, memory profiling, and regression gates
generation.rs Autoregressive generation with top-k/top-p sampling
config.rs Model configuration and validation
data.rs Token batching and dataset construction
attention.rs Multi-head attention with RoPE
mlp.rs SwiGLU feed-forward
compressor.rs Token-conv and channel-conv state compression
bin/ddl/ CLI: train, compare, benchmark, generate
bin/ddl-tui/ Interactive terminal UI (ratatui)
tests/ Integration tests for all modules
examples/ End-to-end demos (train_tiny, compare_models, spectral_report, ...)
fixtures/ Benchmark regression baselines and sample data
```
## Stack
| Language | Rust (2024 edition) |
| ML framework | [Burn](https://burn.dev) 0.20 |
| Backend | ndarray + autodiff (CPU) |
| Serialization | serde / serde_json |
| TUI | ratatui + crossterm (optional, `--features tui`) |
## Development
Validation commands that are expected to pass today:
```bash
cargo check
cargo test
cargo test --examples
cargo build --features tui
cargo clippy --all-targets --all-features -- -D warnings
```
Coverage is enforced in CI with `cargo-llvm-cov` over the library/test contract, excluding
`src/bin/` entrypoints that currently only have smoke coverage. Benchmark policy is contract-based:
CI verifies benchmark report generation and regression-gate behavior, but does not yet enforce
absolute latency budgets on hosted runners.
See [CONTRIBUTING.md](CONTRIBUTING.md) for the contributor workflow.
## Known Limitations
- CPU/ndarray backend only; no GPU path is enabled in the manifest.
- Training and evaluation workflows assume local, in-memory token datasets rather than streaming
corpora.
- Benchmark memory numbers are estimates intended for regression tracking, not allocator-precise
telemetry.
- The crate is pre-1.0; public APIs may change as typed error handling and docs continue to harden.
## Roadmap
1. Contract hardening: finish typed user-boundary errors and expand rustdoc coverage across the
public API.
2. Research workflow depth: improve dataset ergonomics, artifact metadata, and richer diagnostics.
3. Backend expansion: evaluate GPU-enabled Burn backends only after CPU semantics remain stable.
## Changelog
User-visible changes are tracked in [CHANGELOG.md](CHANGELOG.md).
## Help
- File bugs and contract mismatches in the GitHub issue tracker:
`https://github.com/AbdelStark/ddl-rs/issues`
- Check the CI workflow contract in [.github/workflows/ci.yml](.github/workflows/ci.yml).
## License
[MIT](LICENSE)