ubt 0.3.0

Unified Binary Tree implementation based on EIP-7864
Documentation
# EIP-7864: Ethereum state using a unified binary tree

| Field | Value |
|-------|-------|
| EIP | 7864 |
| Title | Ethereum state using a unified binary tree |
| Status | Draft |
| Type | Standards Track |
| Category | Core |
| Created | 2025-01-20 |
| Discussion | https://ethereum-magicians.org/t/eip-7864-ethereum-state-using-a-unified-binary-tree/22611 |

## Authors

- Vitalik Buterin (@vbuterin)
- Guillaume Ballet (@gballet)
- Dankrad Feist (@dankrad)
- Ignacio Hagopian (@jsign)
- Kevaundray Wedderburn (@kevaundray)
- Tanishq Jasoria (@tanishqjasoria)
- Gajinder Singh (@g11tech)
- Danno Ferrin (@shemnon)
- Piper Merriam (@pipermerriam)
- Gottfried Herold (@GottfriedHerold)

## Abstract

Introduce a new binary state tree, intended to replace the hexary patricia trees. Account and storage tries are merged into a single tree with 32-byte keys, which also contains contracts code. Account data is broken into independent leaves which are grouped by 256 in order to provide some locality.

**Note:** The hash function used in the current draft is not final. The current implementation uses BLAKE3 to reduce friction for EL clients experimenting with this EIP, but the final decision remains TBD. Other potential candidates include Keccak and Poseidon2.

## Motivation

Ethereum's long-term goal is to allow blocks to be proved with validity proof so that chain verification is as simple and fast as possible. One of the most challenging parts of achieving that goal is proving the state of the tree, which is required for EVM execution.

The current Merkle-Patricia Trie (MPT) design isn't friendly to validity proofs for many reasons:
- Uses RLP for node encoding
- Uses Keccak as a hashing function
- Is a "tree of trees"
- Does not include accounts code as part of the state

Regarding regular Merkle proofs in an MPT, since it's a hexary tree, their size is quite big on average and in worst-case scenarios. Given a 2^32 size tree, the expected size for proving a single branch is `15 * 32 * log_16(2^32) = 3840`. From a worst-case block perspective, if 30M gas is used to access only a single byte of different account codes since this code isn't chunkified, we'd need `30M/2400*(5*480+24k)=330MB`.

A binary tree has a good balance between out-of-circuit and in-circuit proving. Since the tree arity is two, this reduces the size of regular Merkle proofs.

## Specification

### Notable changes from the hexary structure

- The account and storage tries are merged into a single trie
- RLP is no longer used
- The account's code is chunked and included in the tree
- Account data (e.g., balance, nonce, first storage slots, first code-chunks) is co-located in the tree to reduce branch openings

### Tree structure

The proposed Binary Tree stores key-value entries where both the key and value are 32 bytes. The first 31-bytes of the key define the entry stem, and the last byte is the subindex in that stem. If two keys have the same stem, they live in the same "big branch" which co-locates groups of 256 keys.

Four node types:

| Node Type | Fields | Description |
|-----------|--------|-------------|
| InternalNode | left_hash, right_hash | Internal branching node |
| StemNode | stem, left_hash, right_hash | Node at end of stem path with 256 leaf values |
| LeafNode | value | 32-byte value or empty |
| EmptyNode | - | Empty node/sub-tree |

### Node merkelization

Hash function rules:
- `hash([0x00] * 64) = [0x00] * 32`
- `hash(value) = H(value)` where H is the selected cryptographic hash function

Node hash formulas:
- `internal_node_hash = hash(left_hash || right_hash)`
- `stem_node_hash = hash(stem || 0x00 || hash(left_hash || right_hash))`
- `leaf_node_hash = hash(value)`
- `empty_node_hash = [0x00] * 32`

### Tree embedding parameters

| Parameter | Value |
|-----------|-------|
| BASIC_DATA_LEAF_KEY | 0 |
| CODE_HASH_LEAF_KEY | 1 |
| HEADER_STORAGE_OFFSET | 64 |
| CODE_OFFSET | 128 |
| STEM_SUBTREE_WIDTH | 256 |
| MAIN_STORAGE_OFFSET | 256^31 |

Invariants:
- `STEM_SUBTREE_WIDTH > CODE_OFFSET > HEADER_STORAGE_OFFSET`
- `HEADER_STORAGE_OFFSET` is greater than leaf keys
- `MAIN_STORAGE_OFFSET` must be a power of `STEM_SUBTREE_WIDTH`

### Header values

Basic data layout (big-endian encoding):

| Name | Offset | Size |
|------|--------|------|
| version | 0 | 1 |
| reserved | 1 | 4 |
| code_size | 5 | 3 |
| nonce | 8 | 8 |
| balance | 16 | 16 |

### Code chunking

Chunk `i` stores a 32 byte value:
- Byte 0: number of leading bytes that are part of PUSHDATA
- Bytes 1-31: the i'th 31-byte slice of code

Constants:
- PUSH_OFFSET = 95
- PUSH1 = 96
- PUSH32 = 127

### Storage

- First 64 storage slots located in account stem at HEADER_STORAGE_OFFSET
- Main storage at MAIN_STORAGE_OFFSET + storage_key
- Slots in same STEM_SUBTREE_WIDTH range share single stem

### Related EIPs

- Fork: EIP-7612
- Access events: EIP-4762

## Rationale

### Single tree design

Reasons for single-layer tree structure:
- **Simplicity**: easier to work with key/value store abstraction
- **Uniformity**: state uniformly spread throughout tree, useful for state-syncing
- **Extensibility**: account headers and code in same structure as storage

### SNARK friendliness and Post-Quantum security

Hash function candidates:

| Candidate | Pros | Cons |
|-----------|------|------|
| BLAKE3 | Good out-of-circuit performance, reasonable in-circuit, well-established security | Current reference only |
| Keccak | Already used in Ethereum, well-studied | Less efficient for circuit proving |
| Poseidon2 | Excellent in-circuit performance | Security analysis ongoing |

Post-quantum considerations:
- Quantum computers potentially real in 2030s
- NIST recommends stopping ECC use by 2030
- This proposal only depends on hash functions (post-quantum safe)
- Advantage over Verkle Trees which rely on elliptic curves

### Arity-2

Binary tries minimize witness size. Branch size formula: `32 * (k-1) * log(N) / log(k)`

Comparison for N = 2^24:

| Arity | Chunks | Bytes |
|-------|--------|-------|
| 2 | 24 | 768 |
| 4 | 36 | 1152 |
| 8 | 56 | 1792 |
| 16 | 90 | 2880 |

### Tree depth

Avoids full 248-bit depth of Sparse Merkle Tree to reduce hashing load in proving systems.

### State-expiry

Compatible with strategies like EIP-7736. Potential solutions:
- Add epoch field to StemNode
- Use 247-bits for stem with two subtrees (StemValuesNode and StemMetaNode)

## Backwards Compatibility

Breaking changes:
1. Gas costs for code chunk access may affect applications' economic viability (mitigation: increase gas limit)
2. Tree structure change makes in-EVM proofs of historical state no longer work

## Security Considerations

Needs discussion.

## Related EIPs

- EIP-7612 (Fork)
- EIP-4762 (Access events)
- EIP-7748 (MPT data migration)
- EIP-7736 (State expiry)

## Copyright

Copyright and related rights waived via CC0.