Kontor Proof-of-Retrievability (PoR)
⚠️ WARNING: This code is unaudited and experimental. Use at your own risk.
This project implements a Proof-of-Retrievability (PoR) system designed to provide economically enforceable guarantees that Storage Nodes are actually storing the data they have committed to. It is a core component for decentralized storage metaprotocols where a network of Indexers (Verifiers) must continuously audit Storage Nodes (Provers).
The system uses Nova recursive SNARKs via the arecibo library to generate constant-size (~10 kB) cryptographic proofs that a prover possesses a specific set of data. These proofs are efficient to verify (~30ms), making it feasible to enforce storage guarantees at scale.
Core Capabilities
- Partition files into fixed 31-byte symbols for direct field element encoding.
- Apply multi-codeword Reed-Solomon (GF(2^8)) for fault tolerance.
- Generate Poseidon Merkle trees over all symbols (data + parity).
- Create recursive SNARKs proving possession of randomly sampled symbols.
- Compress proofs to constant ~10 kB size regardless of file count or challenge count.
- Support dynamic circuit parameters with in-memory caching.
- Reconstruct original files from partial symbol availability (≥90% per codeword).
Glossary
- Storage Node (Prover): A network participant that commits to storing data and must generate proofs to demonstrate ongoing possession.
- Indexer (Verifier): A network participant that validates proofs submitted by Storage Nodes and tracks the state of challenges.
- Symbol: A 31-byte unit of data or parity. The 31-byte size is the maximum that fits in a Pallas field element (255 bits), enabling symbols to encode directly as Merkle leaves. Verifying a Merkle proof proves possession of the field element, and because the encoding is reversible, this proves possession of the symbol's 31 bytes of file data.
- Codeword: A Reed-Solomon encoding unit over GF(2^8) containing 231 data symbols + 24 parity symbols (255 total). Files larger than 231 symbols use multiple independent codewords.
- Leaf: A Merkle tree leaf is a symbol encoded as a Pallas field element via little-endian byte order.
- FileLedger: An aggregated Merkle tree over root commitments of all files being tracked by the system.
- Root Commitment (rc): A Poseidon hash binding a file's Merkle
rootanddepth, preventing depth-spoofing attacks. - State: A value in a recursive hash chain (
state_out = H(state_in, leaf_value)) that links sequential proof steps, preventing replay attacks.
System Protocol
This section describes the network-level protocol for decentralized storage auditing.
Oracle and Challenge Generation
- The network uses a shared oracle (e.g., a Bitcoin block hash) to derive challenges deterministically at regular intervals.
- For each interval and for each relevant file–prover pair, a
Challengeis deterministically constructed from theblock_height,file_metadata, a per-challengeseedderived from the oracle, and aprover_id. - Each
Challengehas a deterministicchallenge_idthat uniquely identifies it on the network.
Challenge Lifecycle
Challenges are created deterministically by all verifiers and transition through a simple lifecycle:
stateDiagram-v2
[*] --> Open: New block interval creates challenge
Open --> Resolved: Valid proof is accepted
Open --> Expired: TTL (e.g., K blocks) elapses
Resolved --> [*]
Expired --> [*]
- When a valid proof is broadcast, all
Openchallenges it covers are markedResolved. - If a challenge is not resolved within its TTL, it becomes
Expired, and the associated Storage Node is penalized according to metaprotocol rules.
Protocol Flow
sequenceDiagram
participant Oracle as Oracle (e.g., Bitcoin)
participant Verifier as Verifiers (Indexers)
participant Prover as Storage Node
Oracle-->>Verifier: New block (height, hash)
Verifier->>Verifier: Deterministically derive Challenges for (file, prover_id) pairs
Note over Verifier: Record new challenges as Open with TTL K blocks
Prover->>Prover: Select a set of Open Challenges to answer
Prover->>Prover: Fetch corresponding PreparedFiles
Prover->>Prover: PorSystem.prove(files, challenges) → Proof
Note over Prover: Proof object internally contains challenge_ids
Prover-->>Verifier: Broadcast Proof to the network
Verifier->>Verifier: On receiving Proof, extract challenge_ids
Verifier->>Verifier: Fetch full Challenge objects from local database by ID
Verifier->>Verifier: PorSystem.verify(proof, challenges) → bool
alt is valid
Verifier->>Verifier: Mark covered Open challenges as Resolved
else is not valid
Verifier->>Verifier: Reject proof, challenges remain Open
end
Ledger & Aggregation
- A
FileLedgerbinds the set of files via an aggregated Merkle tree built over root commitmentsrc = H(TAG_RC, root, depth). - Files are ordered canonically by
file_id(lexicographic, e.g.,BTreeMaporder). Public ledger indices refer to this canonical ordering. - Multi-file proofs pin the ledger root as the aggregated root; single-file proofs pin the file root.
- The verifier provides public ledger indices; the circuit trusts these (no range checks in circuit).
Determinism & Canonical Ordering
- Any set of files is treated deterministically by sorting by
file_idwhenever an order is required. - The map from
file_id→PreparedFileis derived internally fromVec<PreparedFile>to avoid user ordering mistakes. - Seeds differ per file; domain separation prevents cross-file bias (multi-batch aggregation supported).
API Reference
The high-level API centers on a PorSystem object, which consolidates setup, proving, and verification concerns.
Core Interface
// Construction
Data Types
// Public commitment to a file
// Derived values (computed from original_size and protocol constants):
/// The prover's representation of a file, containing the full Merkle tree.
// A verifier's challenge for a single file-prover pair at a specific block height.
// Deterministic identity for a Challenge.
;
// Final succinct proof that includes its coverage set.
// Derivation for ChallengeID (using stable, cryptographic fields only)
challenge_id = H
Usage Example
A complete example demonstrating the API workflow:
use ;
use FileLedger;
// 1. Prepare the file with fixed Reed-Solomon encoding (231 data + 24 parity symbols per codeword)
let my_data = b"This is a test file for the PoR system.";
let = prepare_file.unwrap;
// 2. Create ledger and add the file
let mut ledger = new;
ledger.add_file.unwrap;
// 3. Create PorSystem and challenge
let system = new;
let num_challenges = 5;
let seed = from; // Deterministic seed
let challenge = new;
// 4. Generate proof using the unified API
let files = vec!;
let proof = system.prove.unwrap;
// 5. Verify the proof
let is_valid = system.verify.unwrap;
assert!;
println!;
Proof Serialization
The system provides stable helpers to serialize and deserialize proofs for network transport.
let bytes = proof.to_bytes?;
let parsed = from_bytes?;
The format is versioned, includes the challenge_ids vector, and rejects any trailing data.
Encoding Notes
- Use a network-canonical encoding with explicit versioning and magic bytes.
- Fixed-width, little-endian encodings for integers (e.g.,
block_height: u64). - Field elements are encoded in a canonical 32-byte form.
challenge_idsare serialized as a length-prefixed vector of 32-byte IDs.
Technical Implementation
This section provides a technical overview of the cryptographic implementation for contributors.
Core Architecture
The system processes data through a pipeline of erasure coding, chunking, and Merkle tree construction before entering the recursive proving loop.
graph TD
A[Original Data] --> B{1. Partition into 31-byte symbols};
B --> C[Data Symbols];
C --> D{2. Multi-Codeword RS Encode};
D --> E[All Symbols: Data + Parity];
E --> F{3. Build Merkle Tree};
F --> G[Merkle Root];
subgraph Prover Loop per symbol challenge
direction LR
H(state_in) --> I{4. Derive Symbol Index};
J(random_seed) --> I;
I -- symbol_index --> K{5. Prove Inclusion};
L(symbol & Merkle path) --> K;
G --> K;
K --> M{6. Update State};
L -- symbol_value --> M;
H --> M;
M --> N(state_out);
end
K -- SNARK per step --> O[7. Compress via Spartan];
O --> P[Constant-size Proof ~10kB];
subgraph Verifier
P --> Q{8. Verify};
G --> Q;
J --> Q;
end
subgraph Reconstruction
R[Available Symbols ≥90% per codeword] --> S{9. RS Decode};
S --> A;
end
Core Components:
src/main.rs: CLI benchmark demonstrating the full PoR workflow.src/api/: High-level API providingprepare_file(),prove(),verify(), andreconstruct_file().src/erasure.rs: Reed-Solomon erasure coding implementation.src/merkle.rs: Merkle tree implementation using Poseidon hash with domain separation.src/circuit/: The unified NovaStepCircuitfor the PoR computation.src/config.rs: Centralized configuration and public I/O layout.src/ledger.rs: Ledger management, including versioned save/load and root validation on load.
Data Encoding and Merkle Tree Construction
- Symbol Partitioning: Raw data is partitioned into fixed 31-byte symbols. The 31-byte size is the maximum that fits in a Pallas field element (255 bits), enabling symbols to encode directly as Merkle leaves.
- Multi-Codeword Reed-Solomon: Symbols are grouped into codewords of 231 data symbols. Reed-Solomon encoding over GF(2^8) generates 24 parity symbols per codeword (255 total). Files larger than 231 symbols use multiple independent codewords.
- Merkle Tree Construction: Each symbol encodes directly as a Pallas field element (little-endian byte order) to become a leaf. Internal nodes use Poseidon:
H(TAG_NODE, left, right). Tree is padded to next power of two. - Proof-of-Retrievability: Verifying a Merkle proof proves possession of the field element. Because the encoding is reversible, this proves possession of the symbol's 31 bytes of file data.
- Domain Separation: All Poseidon operations use distinct tags to prevent cross-context collisions.
Circuit Design
The PorCircuit implements Nova's StepCircuit trait. For each step, it proves:
- Correct calculation of the challenged leaf index.
- Knowledge of a valid Merkle path from the leaf to the public root.
- Correct evolution of the state via a hash chain:
state_out = H(state_in, leaf_value).
Public I/O Vector Layout (Primary):
The vector length is 2 + 4 * files_per_step with the following sections:
- Fixed (2):
aggregated_root,state_in. - Ledger indices (F):
ledger_index_0 ... ledger_index_{F-1}. - Depths (F):
actual_depth_0 ... actual_depth_{F-1}. - Seeds (F):
seed_0 ... seed_{F-1}(enables multi-batch aggregation). - Leaves (F):
leaf_0 ... leaf_{F-1}.
Notes:
- The leaves section is initialized to zero in
z0_primaryand filled by the circuit; it is carried forward step-to-step by Nova.
Security Properties:
- Public Depth Binding: Each slot's computed depth is enforced to equal its public depth input.
- Ledger Binding: Public ledger indices and the aggregated root cryptographically prove that each file's
(root, depth)commitment exists in the canonicalFileLedger. - Gating Logic: Circuit slots are only processed if their public depth is greater than zero, allowing padding slots to be skipped securely.
Important SNARK API Semantics
This project uses arecibo/Nova, which has a deliberate API quirk:
RecursiveSNARK::new(...)synthesizes step 0.- You must call
prove_step(...)exactlyNtimes for a batch withNiterations; the first call is a no-op that advances the internal counter, and only calls 2..N synthesize. - Verification must pass
num_steps = N.
Our API abstracts this complexity away from the user.
Developer Guide
CLI Usage
A CLI is provided for large-scale proving and verification runs, as well as API demonstrations.
Examples:
# Run with small, approachable defaults (3 files, challenge 2, 3 steps each)
# Run large-scale benchmark
Flags:
--total-files <N>: Total files to prepare and add to the ledger (default: 3).--challenged-files <K>: Subset of files to prove (default: 2).--num-challenges <C>: Recursive steps to prove for each challenged file (default: 3).--file-size <bytes>: Size of synthetic data for each file.--no-verify: Skip the verification phase.-v/-vv: Increase logging verbosity to DEBUG/TRACE.
Test Suite
The project includes extensive unit and integration tests. Run all tests with:
Running Benchmarks
Microbenchmarks and a small end-to-end flow are available via the bench binary:
Performance Characteristics
- Proof Size: ~10 kB (constant across challenge count and file set within a shape).
- Verification Time: ~30 ms for compressed SNARK verification.
- Proving Time: Approximately linear in the number of recursive steps.
Errors and Failure Modes
Key error variants surfaced at API boundaries (see NovaPoRError):
InvalidInput,InvalidChallengeCount,ChallengeMismatch(e.g., non-uniformnum_challengesacross the batch).FileNotFound,FileNotInLedger,MetadataMismatch.MerkleTree,Circuit,Snark.Serialization,IO.
Enabling Git Hooks
This project includes a pre-push hook to run tests and a security audit. Enable it with: