Client for Polkadot and Substrate-compatible chains.
A blockchain is, in its essence, a distributed and decentralized key-value database. The principle of a blockchain is to make it possible for any participant to perform modifications to this database, and for all participants to eventually agree on the current state of said database.
In Polkadot and Substrate-compatible chains, the state of this database is referred to as
“the storage”. The storage can be seen more or less as a very large
A blockchain therefore consists in three main things:
- The initial state of the storage at the moment when the blockchain starts.
- A list of blocks, where each block represents a group of modifications performed to the storage.
- A peer-to-peer network of clients connected to each other and exchanging information such as newly-produced blocks.
Blocks are built on top of each other, forming a sequential list of modifications to the storage on top of its initial state.
A block primarily consists in three properties:
- A parent block, referred to by its hash.
- An ordered list of extrinsics, representing a state change about the storage. An extrinsic can be either a transaction or an intrisic.
- A list of digest log items, which include information necessary to verify the authenticity of the block, such as a cryptographic signature of the block made by its author.
In order to make abstractions easier, there also exists what is called the genesis block, or block number 0. It doesn’t have any parent, extrinsic, or digest item.
From these three block properties, the following other properties can be derived:
- The hash of the block. This is a unique 32 bytes identifier obtained by hashing all the block’s information together in a specific way.
- The block number. It is equal to the parent’s block number plus one, or equal to zero for the genesis block
- The state of the storage. It consists in the state of the storage of the parent block, on top of which the block’s extrinsics have been applied. The state of the storage of the genesis block is the initial state.
Note: Not all these properties are held in memory or even on disk. For instance, the state of the storage is quite large and holding a full version of the storage of every single block in the chain would be unrealistic.
The trie is a data structure that plays an important role in the way a blockchain functions.
It consists in a tree of nodes associated with a key, some of these nodes containing a value. Each node is associated with a hash called the Merkle value that encompasses the Merkle values of the node’s children. The Merkle value of the root node of the tree is designated as “the Merkle trie root” or “the trie root”.
trie module for more details.
In practical terms, when a block needs to be stored or transferred between machines, it is split in two parts: a header and a body. The body of a block simply consists in its list of extrinsics.
Note: The body of a block and the list of extrinsics of a block are the exact same thing, and these two designations can be used interchangeably.
A block’s header contains the following:
- The hash of the parent of the block.
- The block number.
- The state trie root, which consists in the trie root of all the keys and values of the storage of this block.
- The extrinsics trie root, which consists in the Merkle root of a trie containing the extrinsics of the block.
- The list of digest log items.
The hash of a block consists in the blake2 hash of its header.
header module for more information.
The state of the storage of each block is mostly opaque from the client’s perspective. There
exists, however, a few hardcoded keys, the most important one being
[0x3a, 0x63, 0x6f, 0x64, 0x65] (which is the ASCII encoding of the string
value associated with this key must always be a WebAssembly binary
code called the runtime.
This WebAssembly binary code must obey a certain ABI and is responsible for the following actions:
- Verifying whether a block has been correctly created by its author.
- Providing the list of modifications to the storage performed by a block, given its header and body.
- Generating the header of newly-created blocks.
- Validating transactions received from third-parties, in order to later be potentially included in a block.
- Providing the tools necessary to create transactions. See the
In other words, the runtime is responsible for all the actual logic behind the chain being run. For instance, when performing a transfer of tokens between one account and another, it is the runtime WebAssembly code that verifies that the balances are correct and updates them.
Since applying an extrinsic modifies the state of the storage, it is possible for an extrinsic to modify this WebAssembly binary code. This is called a runtime upgrade. In other words, any block can potentially modify the logic of the chain.
See also the
executor module for more information.
While blocks are built on top of each other, they don’t necessarily form one single chain. It is possible for multiple different blocks to have the same parent, thereby forming multiple chains. This is called a fork.
When multiple chains fly around in the network, each node select one chain they consider as the “best”. However, because of latency, disconnects, or similar reasons, it is possible for a node to not be immediately aware of the existence of a chain that it would consider as a better chain than its current best. Later, when this node learns about this better chain, it will need to perform a reorg (short for reorganization), where blocks that were part of the best chain no longer are.
In order to avoid troublesome situations arising from reorgs, Substrate/Polkadot provides the concept of finality. Once a block has been finalized, it is guaranteed to always be part of the best chain. By extension, the parent of a finalized block is always finalized as well. The genesis block of a chain is by definition is always finalized.
As such, a chain of blocks is composed of two parts: one finalized part (usually the longest), and one non-finalized part. While the finalized part is a single linear list of blocks, the non-finalized part consists in a directed rooted tree.
In order to finalize a block, Substrate/Polkadot nodes use the GrandPa algorithm. Nodes that are authorized to do so by the runtime emit votes on the peer-to-peer network. When two thirds or more of the authorized nodes have voted for a specific block to be finalized, it effectively becomes finalized.
After these votes have been collected, they are collected in what is called a justification. This justification can later be requested by nodes who might not have received all the votes, or for example if they were offline.
This library intentionally doesn’t provide any ready-to-use blockchain client. Instead, it provides tools that can be combined together in order to create a client.
Components that are most certainly needed are:
- Connectivity to a peer-to-peer network. See the
- A persistent storage for the blocks. See the
- A state machine that holds information about the state of the chain and verifies the
authenticity and/or correctness of blocks received from the network. See the
- The capacity to author new blocks. This isn’t implemented as of the writing of this documentation.
- A JSON-RPC client, in order to put a convenient-to-use UI on top of the client. See the
- TODO: telemetry
Data structures describing a chain of blocks.
Substrate chain configuration.
Persistent data storage.
WebAssembly runtime code execution.
Finality consists is declaring a block as irreversible. It is now forever part of the chain.
Parsing SCALE-encoded header.
Substrate-based chains make frequent use of asynchronous cryptography. The identity of a user, be it an account address, a validator, or else, consists of a public key (also sometimes known as a verification key). A user can prove their identity by generating a signature using the private key (also sometimes known as a signing key) corresponding to that public key.
Information string printed out.
JSON-RPC servers. Trusted access to the blockchain.
Low-level peer-to-peer networking.
Syncing, short for synchronizing, consists in synchronizing the state of a local chain with the state of the chain contained in other machines, called remotes or sources.
Radix-16 Merkle-Patricia trie.
Methods that verify whether a block is correct.
Builds the header of the genesis block, from the values in storage.