clay-codes 0.1.0

Clay (Coupled-Layer) erasure codes - MSR codes with optimal repair bandwidth
Documentation

clay-codes

Crates.io Documentation License

A Rust implementation of Clay (Coupled-Layer) erasure codes, based on the FAST'18 paper "Clay Codes: Moulding MDS Codes to Yield an MSR Code" (PDF). The construction was originally implemented in Ceph and is now part of its master codebase.

Why Clay codes?

In distributed storage, node failures are routine. When a node goes down, the system must repair it by fetching data from surviving nodes. With Reed-Solomon codes, repairing a single node means downloading k full chunks across the network -- even though you only need to reconstruct one.

Clay codes are MSR (Minimum Storage Regenerating) codes: they provide the same fault tolerance and storage overhead as Reed-Solomon, but repair a failed node by downloading only a fraction of each helper's data. In practice, this reduces repair network traffic by up to 2.9x and repair time by up to 3x.

Quickstart

Add to your Cargo.toml:

[dependencies]
clay-codes = "0.1"

Encode and Decode

use clay_codes::ClayCode;
use std::collections::HashMap;

// 4 data chunks, 2 parity chunks, 5 helpers for repair
let clay   = ClayCode::new(4, 2, 5).unwrap();
let data   = b"Hello, Clay codes!";
let chunks = clay.encode(data);

// Simulate losing node 0 -- collect the surviving chunks
let mut available: HashMap<usize, Vec<u8>> = HashMap::new();
for (node, chunk) in chunks.iter().enumerate() {
    if node != 0 {
        available.insert(node, chunk.clone());
    }
}

// Recover the original data from the remaining 5 chunks
let recovered = clay.decode(&available, &[0]).unwrap();
assert_eq!(&recovered[..data.len()], &data[..]);

Bandwidth-Optimal Repair

This is the core advantage over Reed-Solomon. Rather than downloading full chunks from k nodes, minimum_to_repair tells you exactly which sub-chunks to fetch from each helper, and repair reconstructs the lost node from that partial data.

use clay_codes::ClayCode;
use std::collections::HashMap;

let clay           = ClayCode::new(4, 2, 5).unwrap();
let data           = b"Hello, Clay codes!";
let chunks         = clay.encode(data);
let chunk_size     = chunks[0].len();
let sub_chunk_size = chunk_size / clay.sub_chunk_no;

// Which sub-chunks do we need from each helper to repair node 0?
let helpers    = vec![1, 2, 3, 4, 5];
let repair_map = clay.minimum_to_repair(0, &helpers).unwrap();

// Fetch only those sub-chunks from each helper node
let mut partial: HashMap<usize, Vec<u8>> = HashMap::new();

for (helper, sub_chunks) in &repair_map {
    let mut buf = Vec::new();
    for &sc in sub_chunks {
        let start = sc * sub_chunk_size;
        let end   = start + sub_chunk_size;
        buf.extend_from_slice(&chunks[*helper][start..end]);
    }
    partial.insert(*helper, buf);
}

// Reconstruct node 0 from the partial data
let recovered = clay.repair(0, &partial, chunk_size).unwrap();
assert_eq!(recovered, chunks[0]);

Parameters

Clay codes are configured with three values:

Parameter Description
k Number of data chunks
m Number of parity chunks (tolerates up to m simultaneous failures)
d Number of helper nodes for repair (k+1 <= d <= k+m-1)

The remaining parameters are derived automatically:

Derived Formula Meaning
q d - k + 1 Coupling factor
alpha q^t Sub-chunks per chunk (sub-packetization level)
beta alpha / q Sub-chunks downloaded per helper during repair

Example Configurations

Config (k, m, d) alpha beta Repair BW vs RS
(4, 2, 5) 8 4 37.5% savings
(9, 3, 11) 81 27 59.3% savings
(10, 4, 13) 256 64 67.5% savings

Higher d - k means more savings, at the cost of larger sub-packetization.

Documentation

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.