clay-codes

A Rust implementation of Clay (Coupled-Layer) erasure codes, based on the FAST'18 paper "Clay Codes: Moulding MDS Codes to Yield an MSR Code" (PDF). The construction was originally implemented in Ceph and is now part of its master codebase.

Why Clay codes?

In distributed storage, node failures are routine. When a node goes down, the system must repair it by fetching data from surviving nodes. With Reed-Solomon codes, repairing a single node means downloading k full chunks across the network -- even though you only need to reconstruct one.

Clay codes are MSR (Minimum Storage Regenerating) codes: they provide the same fault tolerance and storage overhead as Reed-Solomon, but repair a failed node by downloading only a fraction of each helper's data. In practice, this reduces repair network traffic by up to 2.9x and repair time by up to 3x.

Quickstart

Add to your Cargo.toml:

[dependencies]
clay-codes = "0.1"

Encode and Decode

use clay_codes::ClayCode;
use std::collections::HashMap;

// 4 data chunks, 2 parity chunks, 5 helpers for repair
let clay   = ClayCode::new(4, 2, 5).unwrap();
let data   = b"Hello, Clay codes!";
let chunks = clay.encode(data);

// Simulate losing node 0 -- collect the surviving chunks
let mut available: HashMap<usize, Vec<u8>> = HashMap::new();
for (node, chunk) in chunks.iter().enumerate() {
    if node != 0 {
        available.insert(node, chunk.clone());
    }
}

// Recover the original data from the remaining 5 chunks
let recovered = clay.decode(&available, &[0]).unwrap();
assert_eq!(&recovered[..data.len()], &data[..]);

Bandwidth-Optimal Repair

This is the core advantage over Reed-Solomon. Rather than downloading full chunks from k nodes, minimum_to_repair tells you exactly which sub-chunks to fetch from each helper, and repair reconstructs the lost node from that partial data.

use clay_codes::ClayCode;
use std::collections::HashMap;

let clay           = ClayCode::new(4, 2, 5).unwrap();
let data           = b"Hello, Clay codes!";
let chunks         = clay.encode(data);
let chunk_size     = chunks[0].len();
let sub_chunk_size = chunk_size / clay.sub_chunk_no;

// Which sub-chunks do we need from each helper to repair node 0?
let helpers    = vec![1, 2, 3, 4, 5];
let repair_map = clay.minimum_to_repair(0, &helpers).unwrap();

// Fetch only those sub-chunks from each helper node
let mut partial: HashMap<usize, Vec<u8>> = HashMap::new();

for (helper, sub_chunks) in &repair_map {
    let mut buf = Vec::new();
    for &sc in sub_chunks {
        let start = sc * sub_chunk_size;
        let end   = start + sub_chunk_size;
        buf.extend_from_slice(&chunks[*helper][start..end]);
    }
    partial.insert(*helper, buf);
}

// Reconstruct node 0 from the partial data
let recovered = clay.repair(0, &partial, chunk_size).unwrap();
assert_eq!(recovered, chunks[0]);

Parameters

Clay codes are configured with three values:

Parameter	Description
`k`	Number of data chunks
`m`	Number of parity chunks (tolerates up to `m` simultaneous failures)
`d`	Number of helper nodes for repair (`k+1 <= d <= k+m-1`)

The remaining parameters are derived automatically:

Derived	Formula	Meaning
`q`	`d - k + 1`	Coupling factor
`alpha`	`q^t`	Sub-chunks per chunk (sub-packetization level)
`beta`	`alpha / q`	Sub-chunks downloaded per helper during repair

Example Configurations

Config (k, m, d)	alpha	beta	Repair BW vs RS
(4, 2, 5)	8	4	37.5% savings
(9, 3, 11)	81	27	59.3% savings
(10, 4, 13)	256	64	67.5% savings

Higher d - k means more savings, at the cost of larger sub-packetization.

Documentation

API reference: docs.rs/clay-codes
Implementation guide: docs/clay-practical-implementation.md -- storage layouts, repair access patterns, metadata schemas, and performance considerations for integrating Clay codes into a distributed storage system.
Paper notes: docs/clay-codes-fast18.md -- detailed notes on the FAST'18 paper including construction, algorithms, and experimental results.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

clay-codes 0.1.1