clay-codes
A Rust implementation of Clay (Coupled-Layer) erasure codes, based on the FAST'18 paper "Clay Codes: Moulding MDS Codes to Yield an MSR Code" (PDF). The construction was originally implemented in Ceph and is now part of its master codebase.
Why Clay codes?
In distributed storage, node failures are routine. When a node goes down, the system must repair it by fetching data from surviving nodes. With Reed-Solomon codes, repairing a single node means downloading k full chunks across the network -- even though you only need to reconstruct one.
Clay codes are MSR (Minimum Storage Regenerating) codes: they provide the same fault tolerance and storage overhead as Reed-Solomon, but repair a failed node by downloading only a fraction of each helper's data. In practice, this reduces repair network traffic by up to 2.9x and repair time by up to 3x.
Quickstart
Add to your Cargo.toml:
[]
= "0.1"
Encode and Decode
use ClayCode;
use HashMap;
// 4 data chunks, 2 parity chunks, 5 helpers for repair
let clay = new.unwrap;
let data = b"Hello, Clay codes!";
let chunks = clay.encode;
// Simulate losing node 0 -- collect the surviving chunks
let mut available: = new;
for in chunks.iter.enumerate
// Recover the original data from the remaining 5 chunks
let recovered = clay.decode.unwrap;
assert_eq!;
Bandwidth-Optimal Repair
This is the core advantage over Reed-Solomon. Rather than downloading full chunks from k nodes, minimum_to_repair tells you exactly which sub-chunks to fetch from each helper, and repair reconstructs the lost node from that partial data.
use ClayCode;
use HashMap;
let clay = new.unwrap;
let data = b"Hello, Clay codes!";
let chunks = clay.encode;
let chunk_size = chunks.len;
let sub_chunk_size = chunk_size / clay.sub_chunk_no;
// Which sub-chunks do we need from each helper to repair node 0?
let helpers = vec!;
let repair_map = clay.minimum_to_repair.unwrap;
// Fetch only those sub-chunks from each helper node
let mut partial: = new;
for in &repair_map
// Reconstruct node 0 from the partial data
let recovered = clay.repair.unwrap;
assert_eq!;
Parameters
Clay codes are configured with three values:
| Parameter | Description |
|---|---|
k |
Number of data chunks |
m |
Number of parity chunks (tolerates up to m simultaneous failures) |
d |
Number of helper nodes for repair (k+1 <= d <= k+m-1) |
The remaining parameters are derived automatically:
| Derived | Formula | Meaning |
|---|---|---|
q |
d - k + 1 |
Coupling factor |
alpha |
q^t |
Sub-chunks per chunk (sub-packetization level) |
beta |
alpha / q |
Sub-chunks downloaded per helper during repair |
Example Configurations
| Config (k, m, d) | alpha | beta | Repair BW vs RS |
|---|---|---|---|
| (4, 2, 5) | 8 | 4 | 37.5% savings |
| (9, 3, 11) | 81 | 27 | 59.3% savings |
| (10, 4, 13) | 256 | 64 | 67.5% savings |
Higher d - k means more savings, at the cost of larger sub-packetization.
Documentation
- API reference: docs.rs/clay-codes
- Implementation guide:
docs/clay-practical-implementation.md-- storage layouts, repair access patterns, metadata schemas, and performance considerations for integrating Clay codes into a distributed storage system. - Paper notes:
docs/clay-codes-fast18.md-- detailed notes on the FAST'18 paper including construction, algorithms, and experimental results.
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.