kontor-crypto 0.1.6

Kontor Proof-of-Retrievability system for decentralized storage
Documentation
# Technical Implementation

This section provides a technical overview of the cryptographic implementation for contributors.

## Core Architecture

The system processes data through a pipeline of erasure coding, chunking, and Merkle tree construction before entering the recursive proving loop.

```mermaid
graph TD
    A[Original Data] --> B{1. Partition into 31-byte symbols};
    B --> C[Data Symbols];
    C --> D{2. Multi-Codeword RS Encode};
    D --> E[All Symbols: Data + Parity];
    E --> F{3. Build Merkle Tree};
    F --> G[Merkle Root];

    subgraph Prover Loop per symbol challenge
        direction LR
        H(state_in) --> I{4. Derive Symbol Index};
        J(random_seed) --> I;
        I -- symbol_index --> K{5. Prove Inclusion};
        L(symbol & Merkle path) --> K;
        G --> K;
        K --> M{6. Update State};
        L -- symbol_value --> M;
        H --> M;
        M --> N(state_out);
    end

    K -- SNARK per step --> O[7. Compress via Spartan];
    O --> P[Constant-size Proof ~10kB];

    subgraph Verifier
        P --> Q{8. Verify};
        G --> Q;
        J --> Q;
    end

    subgraph Reconstruction
        R[Available Symbols ≥90% per codeword] --> S{9. RS Decode};
        S --> A;
    end
```

**Core Components:**

-   **`src/main.rs`**: Production storage node simulator with realistic multi-file proof aggregation, heterogeneous file sizes, and economic analysis.
-   **`src/api/`**: High-level unified API via `PorSystem` struct providing `prepare_file()`, `prove()`, and `verify()` methods.
  -   **`system.rs`**: `PorSystem` - unified API entry point
  -   **`plan.rs`**: Preprocessing logic shared between prove and verify
  -   **`prove.rs`**: Proof generation with automatic shape derivation
  -   **`verify.rs`**: Verification with secure ledger root pinning
  -   **`witness.rs`**: Circuit witness generation with guaranteed padding
  -   **`types.rs`**: Core data types (Challenge, Proof, FileMetadata, etc.)
-   **`src/circuit/`**: The unified Nova `StepCircuit` for PoR verification.
  -   **`synth.rs`**: Main circuit synthesis logic
  -   **`witness.rs`**: Witness data structures
  -   **`gadgets/`**: Low-level circuit components (Merkle, Poseidon, selection)
  -   **`debug.rs`**: Circuit uniformity fingerprinting (debug builds)
-   **`src/erasure.rs`**: Multi-codeword Reed-Solomon (GF(2^8), 231+24 symbols per codeword).
-   **`src/merkle.rs`**: Poseidon Merkle trees with domain separation.
-   **`src/ledger.rs`**: File ledger with aggregated Merkle tree of root commitments (rc values).
-   **`src/params.rs`**: Dynamic parameter generation with in-memory caching for different circuit shapes.
-   **`src/metrics.rs`**: Performance metrics and structured output for benchmarking.
-   **`src/config.rs`**: Centralized configuration, economic constants, and public I/O layout.
-   **`src/poseidon.rs`**: Domain-separated Poseidon hashing with cached constants.
-   **`benches/`**: Divan benchmark suite (`bench_main.rs`) for regression tracking (primitives, file prep, proving, verification, e2e).

## Data Encoding and Merkle Tree Construction

1.  **Symbol Partitioning**: Raw data is partitioned into fixed 31-byte symbols. The 31-byte size is the maximum that fits in a Pallas field element (255 bits), enabling symbols to encode directly as Merkle leaves.
2.  **Multi-Codeword Reed-Solomon**: Symbols are grouped into codewords of 231 data symbols. Reed-Solomon encoding over GF(2^8) generates 24 parity symbols per codeword (255 total). Files larger than 231 symbols use multiple independent codewords.
3.  **Merkle Tree Construction**: Each symbol encodes directly as a Pallas field element (little-endian byte order) to become a leaf. Internal nodes use Poseidon: `H(TAG_NODE, left, right)`. Tree is padded to next power of two.
4.  **Proof-of-Retrievability**: Verifying a Merkle proof proves possession of the field element. Because the encoding is reversible, this proves possession of the symbol's 31 bytes of file data.
5.  **Domain Separation**: All Poseidon operations use distinct tags to prevent cross-context collisions.

## Circuit Design

The `PorCircuit` implements Nova's `StepCircuit` trait. For each step, it proves:
1.  Correct calculation of the challenged leaf index via domain-separated hashing.
2.  Knowledge of a valid Merkle path from the leaf to the file root.
3.  File root commitment membership in the aggregated ledger (multi-file only).
4.  Correct evolution of the state via a hash chain: `state_out = H(TAG_STATE, state_in, leaf_value)`.

**Public I/O Vector Layout (Primary):**

The vector length is `2 + 4 * files_per_step` with the following sections:

1.  **Fixed (2):** `aggregated_root`, `state_in`.
2.  **Ledger indices (F):** `ledger_index_0 ... ledger_index_{F-1}` (canonical positions in ledger).
3.  **Depths (F):** `actual_depth_0 ... actual_depth_{F-1}` (for depth binding).
4.  **Seeds (F):** `seed_0 ... seed_{F-1}` (enables multi-batch aggregation with different block hashes).
5.  **Leaves (F):** `leaf_0 ... leaf_{F-1}` (challenged symbols, initially zero).

Notes:
-   The leaves section is initialized to zero in `z0_primary` and filled by the circuit; it is carried forward step-to-step by Nova.
-   Different seeds per file enable aggregating challenges from different block heights (different Bitcoin block hashes).

**Security Properties:**

-   **Public Depth Binding:** Each slot's computed depth is enforced to equal its public depth input via constraint.
-   **Ledger Binding:** Public ledger indices and the aggregated root cryptographically prove that each file's `rc = H(TAG_RC, root, depth)` commitment exists in the canonical `FileLedger`.
-   **Gating Logic:** Circuit slots are only processed if their public depth is greater than zero, allowing padding slots to be skipped without changing circuit structure.
-   **Root Commitment (rc):** Binds file root and depth to prevent depth-spoofing attacks.

## Parameter Caching and Shape Derivation

**Dynamic Circuit Shapes:**

The system supports dynamic circuit parameters based on the actual files being proven:
-   **Shape key**: `(files_per_step, file_tree_depth, aggregated_tree_depth)`
-   **files_per_step**: `next_power_of_two(num_files)` - ensures uniform structure
-   **file_tree_depth**: `max(file_depths)` - handles heterogeneous file sizes
-   **aggregated_tree_depth**: `ledger.depth()` for multi-file, 0 for single-file

**In-Memory Caching** (`src/params.rs`):
-   Parameters are expensive to generate (2-5 seconds)
-   Cached by shape key with LRU eviction (max 50 entries)
-   Subsequent proofs with the same shape use cached parameters instantly
-   Simulator displays "Parameter Load (cached)" when cache is hit

## Benchmarking and Testing

**Benchmark Suite** (`benches/bench_main.rs`):
-   Uses **Divan** for statistical benchmarking
-   Regression tracking for critical paths
-   Run with: `cargo bench`

**Production Simulator** (`src/main.rs`):
-   Realistic storage node operation
-   Heterogeneous file sizes and staggered challenges
-   Run with: `cargo run --release`

**Test Suite** (`tests/`):
-   Circuit uniformity regression tests
-   Security tests (ledger binding, depth spoofing, replay attacks)
-   E2E tests with variable file sizes and depths
-   Run with: `cargo test`