clay-codes 0.1.1

# Practical Implementation Guide for Clay Codes

This document provides practical guidance for implementing Clay codes in distributed storage systems, focusing on encoding outputs, storage layouts, repair operations, and metadata design.

---

## Table of Contents

1. [Overview](#1-overview)
2. [Encoding: What It Produces](#2-encoding-what-it-produces)
3. [Understanding Sub-chunk Sizes](#3-understanding-sub-chunk-sizes)
4. [Repair Access Patterns](#4-repair-access-patterns)
5. [Storage Layout Options](#5-storage-layout-options)
6. [Recommended Metadata Schema](#6-recommended-metadata-schema)
7. [API Design Patterns](#7-api-design-patterns)
8. [Performance Considerations](#8-performance-considerations)

---

## 1. Overview

Clay (Coupled-Layer) codes are MSR (Minimum Storage Regenerating) codes that achieve optimal repair bandwidth while maintaining MDS (Maximum Distance Separable) properties. They reduce network traffic during node repair by up to 75% compared to traditional Reed-Solomon codes.

### Key Parameters

```
┌─────────────────────────────────────────────────────────────────┐
│  CLAY CODE PARAMETERS                                           │
├─────────────────────────────────────────────────────────────────┤
│  n = total nodes (data + parity)                                │
│  k = data nodes                                                 │
│  m = n - k = parity nodes                                       │
│  d = helper nodes used during repair (k ≤ d ≤ n-1)              │
│                                                                 │
│  Derived:                                                       │
│  q = d - k + 1           (base for layer indexing)              │
│  t = n / q               (number of y-sections)                 │
│  α = qᵗ                  (sub-chunks per node)                  │
│  β = qᵗ⁻¹                (sub-chunks fetched per helper)        │
│                                                                 │
│  Bandwidth savings: β/α = 1/q                                   │
└─────────────────────────────────────────────────────────────────┘
```

### Example Configuration

```
(n=16, k=10, m=6, d=13) Clay Code
─────────────────────────────────
q = d - k + 1 = 4
t = n / q = 4
α = 4⁴ = 256 sub-chunks per node
β = 4³ = 64 sub-chunks per helper during repair

Repair bandwidth: 25% of RS (75% savings)
```

---

## 2. Encoding: What It Produces

### The Encoding Pipeline

```
                         ENCODING FLOW

    ┌──────────────────────────────────────────────────────────┐
    │                     INPUT: Raw File                      │
    │                        (any size)                        │
    └─────────────────────────┬────────────────────────────────┘
                              │
                              ▼
    ┌──────────────────────────────────────────────────────────┐
    │              Step 1: Pad to Stripe Boundary              │
    │                                                          │
    │   File padded to multiple of stripe size S               │
    │   S must be divisible by (k × α)                         │
    └─────────────────────────┬────────────────────────────────┘
                              │
                              ▼
    ┌──────────────────────────────────────────────────────────┐
    │              Step 2: Divide into Stripes                 │
    │                                                          │
    │   ┌─────────┬─────────┬─────────┬─────────┐              │
    │   │Stripe 0 │Stripe 1 │Stripe 2 │  . . .  │              │
    │   │  S bytes│  S bytes│  S bytes│         │              │
    │   └─────────┴─────────┴─────────┴─────────┘              │
    └─────────────────────────┬────────────────────────────────┘
                              │
                              ▼
    ┌──────────────────────────────────────────────────────────┐
    │           Step 3: Encode Each Stripe (Clay)              │
    │                                                          │
    │   For each stripe:                                       │
    │   1. Split into k data chunks                            │
    │   2. Apply PRT to get uncoupled (U) representation       │
    │   3. Compute m parity chunks using MDS code              │
    │   4. Apply PFT to get coupled (C) representation         │
    └─────────────────────────┬────────────────────────────────┘
                              │
                              ▼
    ┌──────────────────────────────────────────────────────────┐
    │                   OUTPUT: n Shards                       │
    │                                                          │
    │   Each shard contains:                                   │
    │   - One chunk per stripe                                 │
    │   - Each chunk has α sub-chunks                          │
    │                                                          │
    │   ┌────────┐ ┌────────┐ ┌────────┐     ┌────────┐        │
    │   │ Shard  │ │ Shard  │ │ Shard  │     │ Shard  │        │
    │   │   0    │ │   1    │ │   2    │ ... │  n-1   │        │
    │   │        │ │        │ │        │     │        │        │
    │   │(Node 0)│ │(Node 1)│ │(Node 2)│     │(Node15)│        │
    │   └────────┘ └────────┘ └────────┘     └────────┘        │
    └──────────────────────────────────────────────────────────┘
```

### Shard Structure

Each node stores a shard with the following structure:

```
SHARD STRUCTURE (per node, per file)
════════════════════════════════════

┌─────────────────────────────────────────────────────────────┐
│                         SHARD                               │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────┐    │
│  │ Stripe 0 Chunk                                      │    │
│  │  ┌──────┬──────┬──────┬─────────────────┬──────┐    │    │
│  │  │ SC 0 │ SC 1 │ SC 2 │      . . .      │SC 255│    │    │
│  │  └──────┴──────┴──────┴─────────────────┴──────┘    │    │
│  │         α = 256 sub-chunks                          │    │
│  └─────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ Stripe 1 Chunk                                      │    │
│  │  ┌──────┬──────┬──────┬─────────────────┬──────┐    │    │
│  │  │ SC 0 │ SC 1 │ SC 2 │      . . .      │SC 255│    │    │
│  │  └──────┴──────┴──────┴─────────────────┴──────┘    │    │
│  └─────────────────────────────────────────────────────┘    │
│                          . . .                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ Stripe N-1 Chunk                                    │    │
│  │  ┌──────┬──────┬──────┬─────────────────┬──────┐    │    │
│  │  │ SC 0 │ SC 1 │ SC 2 │      . . .      │SC 255│    │    │
│  │  └──────┴──────┴──────┴─────────────────┴──────┘    │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

SC = Sub-chunk
Each sub-chunk size = S / (k × α)
Total shard size = (file_size / k) × (n / k)  [with overhead]
```

---

## 3. Understanding Sub-chunk Sizes

Sub-chunk size is NOT fixed by Clay parameters alone — it depends on the **stripe size** `S`.

### Formula

```
chunk_size     = S / k              (bytes per node per stripe)
subchunk_size  = S / (k × α)        (bytes per sub-chunk per stripe)
codewords      = S / (k × α)        (number of codewords per stripe)
```

### Constraints

```
S must be divisible by (k × α)

For (n=16, k=10, d=13) with α=256:
  Minimum unit = k × α = 10 × 256 = 2560 bytes
  Valid stripe sizes: 2560, 5120, 7680, ... , 1MB, 64MB, etc.
```

### Example Configurations

| Stripe Size | Chunk Size | Sub-chunk Size | Use Case |
|-------------|------------|----------------|----------|
| 2.5 KB | 256 B | 1 B | Minimum (testing only) |
| 1 MB | 100 KB | 400 B | Small files, low latency |
| 10 MB | 1 MB | 4 KB | General purpose |
| 64 MB | 6.4 MB | 25.6 KB | Large files, throughput |
| 256 MB | 25.6 MB | 102.4 KB | Bulk storage |

### Stripe Size Selection Guidelines

```
┌─────────────────────────────────────────────────────────────────┐
│  STRIPE SIZE TRADE-OFFS                                         │
├──────────────────┬──────────────────────────────────────────────┤
│  Smaller Stripe  │  + Less padding waste for small files        │
│  (1-10 MB)       │  + Lower memory during encode/decode         │
│                  │  - More stripes = more metadata overhead     │
│                  │  - Higher per-stripe fixed costs             │
├──────────────────┼──────────────────────────────────────────────┤
│  Larger Stripe   │  + Fewer stripes = less metadata             │
│  (64-256 MB)     │  + Better throughput for large files         │
│                  │  - Wasteful padding for small files          │
│                  │  - Higher memory requirements                │
└──────────────────┴──────────────────────────────────────────────┘

RECOMMENDATION:
- Variable workloads (mixed file sizes): 1-10 MB stripe
- Large file workloads (video, backups): 64+ MB stripe
- Match stripe size to typical file size when possible
```

---

## 4. Repair Access Patterns

### The Problem: Non-Contiguous Access

During repair, each helper node contributes β sub-chunks out of α total. These β sub-chunks are **scattered** across the α indices in a predictable but non-contiguous pattern.

### How Repair Layers are Determined

Each layer `z` is indexed using base-q representation:

```
z = z₀ + q·z₁ + q²·z₂ + ... + qᵗ⁻¹·zₜ₋₁

For q=4, t=4, α=256:
z = z₀ + 4·z₁ + 16·z₂ + 64·z₃

where each zᵢ ∈ {0, 1, 2, 3}
```

A node at position `(x, y)` has "red dots" (repair layers) where `z_y = x`:

```python
def get_repair_layers(node_x: int, node_y: int, q: int, t: int) -> list[int]:
    """
    Returns the β layer indices needed when this node fails.
    """
    alpha = q ** t
    repair_layers = []

    for z in range(alpha):
        # Extract z_y: the y-th digit in base-q representation
        z_y = (z // (q ** node_y)) % q
        if z_y == node_x:
            repair_layers.append(z)

    return repair_layers  # len = β = q^(t-1)
```

### Example: Repair Pattern for Node (x=2, y=1)

```
Parameters: q=4, t=4, α=256, β=64

Condition: z₁ = 2  →  (z // 4) % 4 == 2

Layer indices where z₁ = 2:
┌────────────────────────────────────────────────────────────────┐
│  z₃=0: 8,9,10,11  24,25,26,27  40,41,42,43  56,57,58,59       │
│  z₃=1: 72,73,74,75  88,89,90,91  104,105,106,107  120,121...  │
│  z₃=2: 136,137,138,139  152,153...  168,169...  184,185...    │
│  z₃=3: 200,201,202,203  216,217...  232,233...  248,249,250,251│
└────────────────────────────────────────────────────────────────┘

Total: 64 indices scattered across 0-255
```

### Visualizing the Access Pattern

```
Sub-chunk indices 0-255 for a helper node (repairing node with y=1, x=2):

     0         8        16        24        32        40        48
     ▼         ▼         ▼         ▼         ▼         ▼         ▼
   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
   │   │   │   │   │   │   │   │   │ ■ │ ■ │ ■ │ ■ │   │   │   │   │ ...
   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
                                   └─────────┘
                                   4 contiguous, then gap of 12

Pattern: [4 on, 12 off] repeating, with larger gaps at q² and q³ boundaries

■ = sub-chunk needed for repair
```

### Access Pattern Properties

```
For any failed node:
- β = 64 sub-chunks needed (25% of total)
- Contiguity of the needed sub-chunks depends on which y-section:

y = 0: z₀ = x  →  groups of 1, spaced by 4    (least contiguous)
y = 1: z₁ = x  →  groups of 4, spaced by 16
y = 2: z₂ = x  →  groups of 16, spaced by 64
y = 3: z₃ = x  →  groups of 64, spaced by 256 (most contiguous = single block)

For y = t-1: repair sub-chunks are fully contiguous!
For y = 0: repair sub-chunks are maximally scattered
```

---

## 5. Storage Layout Options

### Option A: Single Contiguous Array

Store all α sub-chunks as one contiguous blob per chunk.

```
STORAGE LAYOUT A
════════════════

Database Schema:
┌─────────────────────────────────────────────────────────────┐
│  chunks (                                                   │
│    file_id      TEXT,                                       │
│    stripe_idx   INT,                                        │
│    node_id      INT,                                        │
│    data         BLOB,      -- size = α × subchunk_size      │
│    PRIMARY KEY (file_id, stripe_idx, node_id)               │
│  )                                                          │
└─────────────────────────────────────────────────────────────┘

On-Disk Layout (per chunk):
┌──────┬──────┬──────┬──────┬──────┬──────┬─────────┬──────┐
│ SC 0 │ SC 1 │ SC 2 │ SC 3 │ SC 4 │ SC 5 │  . . .  │SC 255│
└──────┴──────┴──────┴──────┴──────┴──────┴─────────┴──────┘
│◄──────────────── α × subchunk_size ─────────────────────►│
```

**Repair Read Operation:**
```python
def read_for_repair(file_id: str, stripe_idx: int, node_id: int,
                    repair_layers: list[int], subchunk_size: int) -> bytes:
    # Fetch entire chunk
    chunk = db.get(file_id, stripe_idx, node_id).data

    # Extract needed sub-chunks (multiple seeks on HDD)
    result = bytearray()
    for layer in repair_layers:  # 64 scattered indices
        offset = layer * subchunk_size
        result.extend(chunk[offset:offset + subchunk_size])

    return bytes(result)
```

**Pros:**
- Simple implementation
- Good for full chunk reads (normal data access)
- Single database key per chunk

**Cons:**
- Inefficient repair I/O: 64 scattered reads within the blob
- HDD seek overhead for each non-contiguous access
- Must read entire blob into memory to extract sub-chunks

---

### Option B: Individually Indexed Sub-chunks

Store each sub-chunk as a separate database entry.

```
STORAGE LAYOUT B
════════════════

Database Schema:
┌─────────────────────────────────────────────────────────────┐
│  subchunks (                                                │
│    file_id      TEXT,                                       │
│    stripe_idx   INT,                                        │
│    node_id      INT,                                        │
│    layer_idx    INT,       -- 0 to α-1                      │
│    data         BLOB,      -- size = subchunk_size          │
│    PRIMARY KEY (file_id, stripe_idx, node_id, layer_idx)    │
│  )                                                          │
└─────────────────────────────────────────────────────────────┘

Storage: 256 entries per chunk
```

**Repair Read Operation:**
```python
def read_for_repair(file_id: str, stripe_idx: int, node_id: int,
                    repair_layers: list[int]) -> dict[int, bytes]:
    # Direct lookup of exactly the sub-chunks needed
    result = {}
    for layer in repair_layers:
        result[layer] = db.get(file_id, stripe_idx, node_id, layer).data

    return result

# Can parallelize:
async def read_for_repair_parallel(...):
    tasks = [db.get_async(..., layer) for layer in repair_layers]
    return await asyncio.gather(*tasks)
```

**Pros:**
- Direct access to any sub-chunk
- Parallel fetches possible
- Pay only for what you read during repair

**Cons:**
- 256× more database keys per chunk
- Higher metadata overhead
- More complex data path for normal reads (must reassemble)
- Potential small-write amplification

---

### Option C: Y-Section Grouped Layout (Recommended)

Group sub-chunks by their repair affinity — sub-chunks needed together during repair are stored together.

```
STORAGE LAYOUT C
════════════════

Key Insight:
  When repairing a node in y-section y, we need all layers where z_y = x.
  Group sub-chunks by their z_y value for contiguous repair reads.

Database Schema:
┌─────────────────────────────────────────────────────────────┐
│  chunk_groups (                                             │
│    file_id      TEXT,                                       │
│    stripe_idx   INT,                                        │
│    node_id      INT,                                        │
│    y_group      INT,       -- 0 to t-1 (4 groups for t=4)   │
│    data         BLOB,      -- size = β × subchunk_size      │
│    PRIMARY KEY (file_id, stripe_idx, node_id, y_group)      │
│  )                                                          │
└─────────────────────────────────────────────────────────────┘

Storage: t entries per chunk (4 for our example)
Each entry contains β = 64 sub-chunks, stored contiguously
```

#### Grouping Logic

```
REORGANIZING 256 SUB-CHUNKS INTO 4 GROUPS
═════════════════════════════════════════

Original layer index z has base-4 representation (z₀, z₁, z₂, z₃)

Group by z_y value:

┌─────────┬─────────────────────────────────────────────────────┐
│ y_group │ Contains layers where z_y ∈ {0,1,2,3}              │
│         │ for a specific y position                          │
├─────────┼─────────────────────────────────────────────────────┤
│    0    │ Layers grouped by z₀: used when failed node has y=0│
│         │ Contains 64 sub-chunks for each x ∈ {0,1,2,3}      │
├─────────┼─────────────────────────────────────────────────────┤
│    1    │ Layers grouped by z₁: used when failed node has y=1│
├─────────┼─────────────────────────────────────────────────────┤
│    2    │ Layers grouped by z₂: used when failed node has y=2│
├─────────┼─────────────────────────────────────────────────────┤
│    3    │ Layers grouped by z₃: used when failed node has y=3│
└─────────┴─────────────────────────────────────────────────────┘
```

#### Group Construction

```python
def construct_groups(chunk_data: bytes, q: int, t: int,
                     subchunk_size: int) -> dict[int, bytes]:
    """
    Reorganize α sub-chunks into t groups for optimal repair access.

    Args:
        chunk_data: Original chunk with α contiguous sub-chunks
        q: Base (d - k + 1)
        t: Number of y-sections (n / q)
        subchunk_size: Size of each sub-chunk in bytes

    Returns:
        Dictionary mapping y_group -> grouped sub-chunk data
    """
    alpha = q ** t
    beta = q ** (t - 1)

    groups = {y: bytearray() for y in range(t)}

    # For each y-section, collect sub-chunks in order of (x, remaining coords)
    for y in range(t):
        for x in range(q):  # For each possible x value
            # Collect layers where z_y = x
            for z in range(alpha):
                z_y = (z // (q ** y)) % q
                if z_y == x:
                    offset = z * subchunk_size
                    groups[y].extend(chunk_data[offset:offset + subchunk_size])

    return {y: bytes(data) for y, data in groups.items()}
```

#### Internal Layout of Each Group

```
GROUP INTERNAL STRUCTURE (y_group = 1 example)
══════════════════════════════════════════════

Within y_group=1, sub-chunks are ordered by (x, then other coordinates):

┌─────────────────────────────────────────────────────────────────┐
│  y_group = 1 (β × subchunk_size = 64 × subchunk_size bytes)     │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ x=0 block: 16 sub-chunks (z₁=0)                         │    │
│  │ Layers: 0,1,2,3,16,17,18,19,32,33,34,35,48,49,50,51     │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ x=1 block: 16 sub-chunks (z₁=1)                         │    │
│  │ Layers: 4,5,6,7,20,21,22,23,36,37,38,39,52,53,54,55     │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ x=2 block: 16 sub-chunks (z₁=2)                         │    │
│  │ Layers: 8,9,10,11,24,25,26,27,40,41,42,43,56,57,58,59   │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ x=3 block: 16 sub-chunks (z₁=3)                         │    │
│  │ Layers: 12,13,14,15,28,29,30,31,44,45,46,47,60,61,62,63 │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

For repair of node (x=2, y=1):
  - Read y_group=1
  - Extract x=2 block (offset = 2 × 16 × subchunk_size, len = 16 × subchunk_size)
  - This is ONE contiguous read within the group!
```

All helpers contribute sub-chunks from the same set of layer indices, determined by the failed node's y-section. Storing by y-group means each helper's repair data is a single contiguous read of the entire group, rather than scattered sub-chunk lookups.

So storing by y_group means:
- 1 contiguous read per helper during repair
- t = 4 blobs per chunk instead of 256

**Repair Read Operation (Option C):**
```python
def read_for_repair(file_id: str, stripe_idx: int, node_id: int,
                    failed_node_y: int) -> bytes:
    # Single contiguous read of the entire y-group
    return db.get(file_id, stripe_idx, node_id, failed_node_y).data
```

**Normal Read Operation (Option C):**
```python
def read_full_chunk(file_id: str, stripe_idx: int, node_id: int,
                    q: int, t: int, subchunk_size: int) -> bytes:
    """Reassemble full chunk from groups for normal data access."""
    alpha = q ** t
    chunk = bytearray(alpha * subchunk_size)

    for y in range(t):
        group_data = db.get(file_id, stripe_idx, node_id, y).data
        # Scatter group data back to original positions
        for pos, z in enumerate(get_layers_for_y_group(y, q, t)):
            offset = z * subchunk_size
            group_offset = pos * subchunk_size
            chunk[offset:offset+subchunk_size] = \
                group_data[group_offset:group_offset+subchunk_size]

    return bytes(chunk)
```

**Pros:**
- Optimal repair I/O: single contiguous read per helper
- Only t = 4 database entries per chunk (vs 256 for Option B)
- Reasonable metadata overhead

**Cons:**
- Full reads require reassembly from t groups
- Slightly more complex encoding (must reorder sub-chunks)
- Less intuitive than linear layout

---

### Storage Layout Comparison

| Aspect | Option A | Option B | Option C |
|--------|----------|----------|----------|
| DB entries per chunk | 1 | 256 (α) | 4 (t) |
| Repair read ops | 64 scattered | 64 direct | 1 contiguous |
| Full read ops | 1 contiguous | 256 lookups | 4 + reassembly |
| Metadata overhead | Low | High | Medium |
| HDD performance | Poor repair | N/A (SSD only) | Good |
| SSD performance | Good | Good | Good |
| Implementation | Simple | Complex | Moderate |

**Recommendation:**
- **Option C** for production systems with repair efficiency priority
- **Option A** if repair is rare and read performance is critical
- **Option B** for pure SSD deployments with high IOPS budget

---

## 6. Recommended Metadata Schema

### File Metadata

```sql
CREATE TABLE files (
    file_id         TEXT PRIMARY KEY,
    original_name   TEXT,
    original_size   BIGINT,          -- Original file size in bytes
    padded_size     BIGINT,          -- Size after padding

    -- Clay code parameters
    n               INT,              -- Total nodes
    k               INT,              -- Data nodes
    d               INT,              -- Helper nodes for repair
    q               INT,              -- d - k + 1
    t               INT,              -- n / q
    alpha           INT,              -- q^t (sub-chunks per node)
    beta            INT,              -- q^(t-1) (repair sub-chunks)

    -- Stripe configuration
    stripe_size     BIGINT,           -- Bytes per stripe
    stripe_count    INT,              -- Number of stripes
    subchunk_size   INT,              -- Bytes per sub-chunk

    -- Integrity
    checksum        TEXT,             -- SHA-256 of original file
    created_at      TIMESTAMP,

    -- Status
    status          TEXT              -- 'complete', 'partial', 'failed'
);
```

### Node Layout Metadata

```sql
CREATE TABLE node_layout (
    file_id         TEXT,
    node_index      INT,              -- 0 to n-1
    node_x          INT,              -- x coordinate (0 to q-1)
    node_y          INT,              -- y coordinate (0 to t-1)
    is_data         BOOLEAN,          -- true for data nodes, false for parity

    -- Node location
    storage_node_id TEXT,             -- Physical storage node identifier
    endpoint_url    TEXT,             -- API endpoint for this node

    -- Status
    status          TEXT,             -- 'healthy', 'degraded', 'failed'
    last_verified   TIMESTAMP,

    PRIMARY KEY (file_id, node_index)
);
```

### Stripe Metadata

```sql
CREATE TABLE stripes (
    file_id         TEXT,
    stripe_index    INT,

    -- Integrity per stripe
    data_checksum   TEXT,             -- Checksum of original stripe data

    -- Status tracking
    complete_nodes  INT,              -- Count of nodes with complete data

    PRIMARY KEY (file_id, stripe_index)
);
```

### Chunk Groups (for Option C layout)

```sql
CREATE TABLE chunk_groups (
    file_id         TEXT,
    stripe_index    INT,
    node_index      INT,
    y_group         INT,              -- 0 to t-1

    data            BLOB,             -- β sub-chunks, contiguous
    checksum        TEXT,             -- Checksum of this group

    PRIMARY KEY (file_id, stripe_index, node_index, y_group)
);
```

### Repair Tracking

```sql
CREATE TABLE repair_operations (
    repair_id       TEXT PRIMARY KEY,
    file_id         TEXT,
    failed_node     INT,              -- Node index being repaired
    failed_node_x   INT,
    failed_node_y   INT,

    -- Helper selection
    helper_nodes    INT[],            -- Array of d helper node indices

    -- Progress
    status          TEXT,             -- 'pending', 'in_progress', 'complete', 'failed'
    started_at      TIMESTAMP,
    completed_at    TIMESTAMP,

    -- Metrics
    bytes_fetched   BIGINT,
    bytes_repaired  BIGINT,

    -- Error tracking
    error_message   TEXT
);
```

---

## 7. API Design Patterns

### Core API Endpoints

```
STORAGE NODE API
════════════════

POST /encode
  Upload and encode a file, distribute to n nodes
  Body: multipart file upload
  Response: { file_id, stripe_count, node_assignments[] }

GET /decode/{file_id}
  Retrieve and decode a file
  Response: File stream

GET /chunk/{file_id}/{stripe_idx}/{node_idx}
  Retrieve a full chunk (for normal reads)
  Response: Binary chunk data

GET /repair-data/{file_id}/{stripe_idx}/{node_idx}
  Retrieve sub-chunks needed for repair
  Query params: failed_node_y (determines which y-group to return)
  Response: Binary data (β sub-chunks)

POST /repair/{file_id}/{node_idx}
  Initiate repair of a failed node
  Response: { repair_id, status }

GET /repair/{repair_id}
  Check repair status
  Response: { status, progress, bytes_repaired }
```

### Repair Flow Implementation

```python
class ClayRepairClient:
    def __init__(self, file_metadata: FileMetadata, node_endpoints: list[str]):
        self.meta = file_metadata
        self.endpoints = node_endpoints

    async def repair_node(self, failed_node: int) -> bytes:
        """
        Repair a failed node using optimal Clay repair algorithm.

        Returns the reconstructed shard data for the failed node.
        """
        # Step 1: Determine helper nodes
        helpers = self.minimum_to_repair(failed_node)
        failed_x, failed_y = self.node_to_coords(failed_node)

        # Step 2: Fetch repair data from all helpers (parallel)
        repair_data = {}
        tasks = []

        for helper in helpers:
            endpoint = self.endpoints[helper]
            task = self.fetch_repair_data(
                endpoint=endpoint,
                file_id=self.meta.file_id,
                node_id=helper,
                y_group=failed_y  # This is the key for Option C!
            )
            tasks.append((helper, task))

        results = await asyncio.gather(*[t[1] for t in tasks])
        repair_data = {tasks[i][0]: results[i] for i in range(len(tasks))}

        # Step 3: Reconstruct failed node's data
        return self.clay_repair_decode(
            failed_node=failed_node,
            failed_x=failed_x,
            failed_y=failed_y,
            repair_data=repair_data
        )

    async def fetch_repair_data(self, endpoint: str, file_id: str,
                                 node_id: int, y_group: int) -> bytes:
        """Fetch repair sub-chunks from a helper node."""
        async with aiohttp.ClientSession() as session:
            # Single request fetches all needed sub-chunks (contiguous in Option C)
            url = f"{endpoint}/repair-data/{file_id}"
            params = {"node_id": node_id, "y_group": y_group}

            async with session.get(url, params=params) as resp:
                return await resp.read()

    def minimum_to_repair(self, failed_node: int) -> list[int]:
        """Select d optimal helper nodes for repair."""
        failed_x, failed_y = self.node_to_coords(failed_node)

        # Must include all surviving nodes in same y-section
        y_section_nodes = self.get_y_section_nodes(failed_y)
        mandatory = [n for n in y_section_nodes if n != failed_node]

        # Fill remaining from other y-sections
        other_nodes = [n for n in range(self.meta.n)
                       if n != failed_node and n not in y_section_nodes]

        helpers = mandatory + other_nodes[:self.meta.d - len(mandatory)]
        return helpers[:self.meta.d]

    def node_to_coords(self, node_idx: int) -> tuple[int, int]:
        """Convert linear node index to (x, y) coordinates."""
        # Layout: nodes arranged in t y-sections of q nodes each
        y = node_idx // self.meta.q
        x = node_idx % self.meta.q
        return x, y

    def get_y_section_nodes(self, y: int) -> list[int]:
        """Get all node indices in y-section y."""
        return [y * self.meta.q + x for x in range(self.meta.q)]
```

---

## 8. Performance Considerations

### Bandwidth Comparison

```
REPAIR BANDWIDTH: RS vs CLAY
════════════════════════════

For (n=16, k=10, d=13) code:

RS Code:
  Download k full chunks = k × α × subchunk_size
                        = 10 × 256 × ss
                        = 2560 × ss

Clay Code:
  Download β sub-chunks from d helpers = d × β × subchunk_size
                                       = 13 × 64 × ss
                                       = 832 × ss

Savings: 1 - (832/2560) = 67.5%

┌─────────────────────────────────────────────────────────────┐
│           Repair Bandwidth Reduction: 3× less              │
└─────────────────────────────────────────────────────────────┘
```

### I/O Pattern Comparison

```
DISK I/O PATTERNS BY STORAGE LAYOUT
═══════════════════════════════════

Scenario: Repair single node, 64 MB stripe, 25.6 KB sub-chunks

Option A (contiguous array):
  Per helper: 1 read of 6.4 MB, extract 64 scattered sub-chunks
  Pattern: Read full blob, seek 64 times within buffer
  Actual disk: 1 sequential read + memory operations

Option B (indexed sub-chunks):
  Per helper: 64 random reads of 25.6 KB each
  Pattern: 64 seeks across different locations
  Actual disk: 64 random I/Os (terrible on HDD)

Option C (y-grouped):
  Per helper: 1 read of 1.6 MB (β × subchunk_size)
  Pattern: Single sequential read of exactly what's needed
  Actual disk: 1 sequential read, no waste

┌─────────────────────────────────────────────────────────────┐
│  HDD:  Option C >> Option A >> Option B                     │
│  SSD:  Option C ≈ Option B > Option A (for repair)          │
│  NVMe: Option C ≈ Option B ≈ Option A                       │
└─────────────────────────────────────────────────────────────┘
```

### Memory Requirements

```
MEMORY DURING REPAIR
════════════════════

Repairing one node for a single stripe:

Helper data buffered:
  d × β × subchunk_size = 13 × 64 × 25.6 KB = 21.3 MB

Reconstruction buffer:
  α × subchunk_size = 256 × 25.6 KB = 6.4 MB

U-buffer (uncoupled intermediates):
  n × subchunk_size × repair_layers = varies, ~6-12 MB

Total per stripe: ~35-40 MB

For parallel stripe repair (8 stripes):
  ~280-320 MB
```

### Encoding Overhead

```
ENCODING TIME COMPARISON
════════════════════════

Clay vs RS for same (n, k):

MDS encode (shared):     O(m × k × L)
PRT/PFT transforms:      O(α × L) per parity node

Clay overhead: ~10-30% additional CPU time for transforms

Encoding is write-once; repair is ongoing.
Trade-off favors Clay in read-heavy / repair-frequent workloads.
```

---

## Summary

| Component | Recommendation |
|-----------|----------------|
| **Storage Layout** | Option C (y-grouped) for optimal repair |
| **Stripe Size** | 10-64 MB for typical workloads |
| **Metadata** | Store full Clay params + node coordinates |
| **Repair API** | Single endpoint returning y-group blob |
| **Normal Read** | Reassemble from t groups (parallelizable) |

The key insight is that Clay codes' repair efficiency can only be realized if the storage layout supports contiguous access to repair sub-chunks. Option C achieves this by grouping sub-chunks by their y-section affinity, ensuring that repair operations require only a single sequential read per helper node.