holochain_deterministic_integrity/hash.rs
1//! Functions to generate standardized hashes of Holochain records and
2//! arbitrary bytes.
3//!
4//! Holochain makes extensive use of hashes to address any content. It utilizes
5//! [Blake2b](https://www.blake2.net/) as hashing algorithm. Holochain hashes
6//! have a length of 39 bytes, made up of 3 bytes for identifying the hash, 32
7//! bytes of digest and 4 location bytes. The complete scheme of a hash in byte
8//! format is:
9//!
10//! ```text
11//! <hash type code as varint><hash size in bytes><hash<location>>
12//! ```
13//!
14//! The complete scheme of encoded hashes is:
15//!
16//! ```text
17//! <encoding scheme><hash type code as varint><hash size in bytes><hash<location>>
18//! ```
19//!
20//!
21//! ## Example
22//! This is an example of a public agent key hash, displayed as a byte array in
23//! decimal notation:
24//!
25//! ```text
26//! 132 32 36 39 218 126 34 87
27//! 204 165 227 255 29 236 160 66
28//! 221 163 168 112 215 187 143 152
29//! 68 4 30 206 173 203 210 111
30//! 103 207 124 2 107 67 33
31//! ```
32//!
33//! ### Base64 encoding
34//!
35//! Since hashes have to be exchanged over the network, their bytes are encoded
36//! before sending and decoded after reception, to avoid data corruption during
37//! transport. To convert the hashes from a byte format to a transferrable
38//! string, Holochain encodes them with the
39//! [Base64 scheme](https://developer.mozilla.org/en-US/docs/Glossary/Base64).
40//! Encoding the example public agent key in Base64 results in:
41//!
42//! ```text
43//! hCAkJ9p+IlfMpeP/HeygQt2jqHDXu4+YRAQezq3L0m9nz3wCa0Mh
44//! ```
45//!
46//! ### Self-identifying hash encoding
47//!
48//! Following the
49//! [Multibase protocol](https://github.com/multiformats/multibase), hashes in
50//! Holochain self-identify its encoding in Base64. All hashes in text format
51//! are prefixed with a `u`, to identify them as
52//! [`base64url`](https://github.com/multiformats/multibase/blob/master/multibase.csv#L23).
53//! This encoding guarantees URL and filename safe Base64 strings through
54//! replacing potentially unsafe characters like `+` and `/` by `-` and `_`
55//! (see [RFC4648](https://datatracker.ietf.org/doc/html/rfc4648#section-5)).
56//! The example public agent key becomes:
57//!
58//! ```text
59//! uhCAkJ9p-IlfMpeP_HeygQt2jqHDXu4-YRAQezq3L0m9nz3wCa0Mh
60//! ```
61//!
62//! > This only applies to Base64 encoded strings. Hashes in binary format
63//! > **must not** be prefixed with `u`.
64//!
65//!
66//! ### Self-identifying hash type and size
67//!
68//! Further self-identification of Holochain hashes is achieved by adhering to
69//! the [Multihash protocol](https://github.com/multiformats/multihash). The
70//! scheme it defines allows for including information on the semantic type of
71//! hash and its length in Base64 encoded strings. Resulting hashes have the
72//! following format:
73//!
74//! ```text
75//! <hash type code as varint><hash size in bytes><hash>
76//! ```
77//!
78//! Hashes in Holochain are 39 bytes long and comprise the hash type code, the
79//! hash size in bytes and the hash. Coming back to the byte array
80//! representation of the example agent pub key, the first 3 bytes are
81//! `132 32 36`. In hexadecimal notation, it is written as `0x84 0x20 0x24`.
82//!
83//! Byte 1 and 2 are taken up by the hash type code as an
84//! [unsigned varint](https://github.com/multiformats/unsigned-varint). Varint
85//! is a serial encoding of an integer as a byte array of variable length.
86//! When decoded to a regular integer, varint `132 32` equates to `4100`. This
87//! and the other Multihash values employed for Holochain hashes meet several
88//! criteria:
89//!
90//! * It encodes as more than one byte, as one byte entries are reserved in
91//! Multihash.
92//! * An encoding consisting of two bytes plus the length byte makes three
93//! bytes, which always translates to 4 characters in Base64 encoding.
94//! * The resulting Base64 encoding is supposed to be human-recognizable. `hC`
95//! was chosen in accordance with `holoChain`.
96//!
97//! Byte 3, which is `0x24` in hexadecimal and `36` in decimal notation,
98//! reflects the hash size in bytes, meaning the **hashes are 36 bytes long**.
99//!
100//! ### Digest and DHT location
101//!
102//! The 36 bytes long hash consists of the actual digest of the hashed content
103//! and the computed location of the hash within the distributed hash table
104//! (DHT). The Blake2b algorithm used by Holochain produces hashes of 32 bytes
105//! length.
106//!
107//! The final 4 bytes are location bytes. They are interpreted to identify
108//! the position of an agent's arc, meaning the portion of the DHT that the
109//! agent holds. Location bytes further serve as an integrity check of the hash
110//! itself.
111//!
112//!
113//! ## Valid Holochain hash types
114//!
115//! Here is a list of all valid hash types in Holochain, in hexadecimal,
116//! decimal and Base64 notation and what they are used for:
117//!
118//! | hex | decimal | base64 | integer | usage |
119//! | -------- | --------- | ------ | ------- | -------- |
120//! | 84 20 24 | 132 32 36 | hCAk | 4100 | Agent |
121//! | 84 21 24 | 132 33 36 | hCEk | 4228 | Entry |
122//! | 84 22 24 | 132 34 36 | hCIk | 4356 | Net ID |
123//! | 84 23 24 | 132 35 36 | hCMk | 4484 | |
124//! | 84 24 24 | 132 36 36 | hCQk | 4612 | DHT Op |
125//! | 84 25 24 | 132 37 36 | hCUk | 4740 | |
126//! | 84 26 24 | 132 38 36 | hCYk | 4868 | |
127//! | 84 27 24 | 132 39 36 | hCck | 4996 | |
128//! | 84 28 24 | 132 40 36 | hCgk | 5124 | |
129//! | 84 29 24 | 132 41 36 | hCkk | 5252 | Action |
130//! | 84 2a 24 | 132 42 36 | hCok | 5380 | WASM |
131//! | 84 2b 24 | 132 43 36 | hCsk | 5508 | |
132//! | 84 2c 24 | 132 44 36 | hCwk | 5636 | |
133//! | 84 2d 24 | 132 45 36 | hC0k | 5764 | DNA |
134//! | 84 2e 24 | 132 46 36 | hC4k | 5892 | |
135//! | 84 2f 24 | 132 47 36 | hC8k | 6020 | External |
136//!
137//!
138//! ### Breakdowns of example
139//!
140//! Breakdown of the example agent pub key as byte array in decimal notation:
141//!
142//! | type | length | hash | dht location |
143//! | ---------------------- | ------------- | ---------------------------------------------------------------------------------------------------------------------- | -------------- |
144//! | 132 32 | 36 | 39 218 126 34 87 204 165 227 255 29 236 160 66 221 163 168 112 215 187 143 152 68 4 30 206 173 203 210 111 103 207 124 | 2 107 67 33 |
145//! | public agent key | 36 bytes long | Blake2b hash, 32 bytes long | u32: 558066434 |
146//!
147//!
148//! Breakdown of the example agent pub key encoded as Base64:
149//!
150//! | Multibase encoding | type + length | hash + dht location |
151//! | -------------------- | ----------------------------------- | ------------------------------------------------------- |
152//! | u | hCAk | J9p-IlfMpeP_HeygQt2jqHDXu4-YRAQezq3L0m9nz3wCa0Mh |
153//! | base64url no padding | public agent key of 36 bytes length | Base64 encoding of Blake2b hash + location |
154
155use crate::prelude::*;
156
157/// Hash anything that implements [`TryInto<Entry>`].
158///
159/// Hashes are typed in Holochain, e.g. [`ActionHash`] and [`EntryHash`] are different and yield different
160/// bytes for a given value. This ensures correctness and allows type based dispatch in various
161/// areas of the codebase.
162///
163/// Usually you want to hash a value that you want to reference on the DHT with [`must_get_entry`] etc. because
164/// it represents some domain-specific data sourced externally or generated within the wasm.
165/// [`ActionHash`] hashes are _always_ generated by the process of committing something to a local
166/// chain. Every host function that commits an entry returns the new [`ActionHash`]. The [`ActionHash`] can
167/// also be used with [`must_get_action`] etc. to retreive a _specific_ record from the DHT rather than the
168/// oldest live record.
169/// However there is no way to _generate_ an action hash directly from an action from inside wasm.
170/// [`Record`] values (entry+action pairs returned by [`must_get_action`] etc.) contain prehashed action structs
171/// called [`ActionHashed`], which is composed of a [`ActionHash`] alongside the "raw" [`Action`] value. Generally the pre-hashing is
172/// more efficient than hashing actions ad-hoc as hashing always needs to be done at the database
173/// layer, so we want to re-use that as much as possible.
174/// The action hash can be extracted from the Record as `record.action_hashed().as_hash()`.
175///
176/// @todo is there any use-case that can't be satisfied by the `action_hashed` approach?
177///
178/// Anything that is annotated with #[hdk_entry( .. )] or entry_def!( .. ) implements this so is
179/// compatible automatically.
180///
181/// [`hash_entry`] is "dumb" in that it doesn't check that the entry is defined, committed, on the DHT or
182/// any other validation, it simply generates the hash for the serialized representation of
183/// something in the same way that the DHT would.
184///
185/// It is strongly recommended that you use the [`hash_entry`] function to calculate hashes to avoid
186/// inconsistencies between hashes in the wasm guest and the host.
187/// For example, a lot of the crypto crates in rust compile to wasm so in theory could generate the
188/// hash in the guest, but there is the potential that the serialization logic could be slightly
189/// different, etc.
190///
191/// ```ignore
192/// #[hdk_entry(id="foo")]
193/// struct Foo;
194///
195/// let foo_hash = hash_entry(Foo)?;
196/// ```
197pub fn hash_entry<I, E>(input: I) -> ExternResult<EntryHash>
198where
199 Entry: TryFrom<I, Error = E>,
200 WasmError: From<E>,
201{
202 match HDI.with(|h| h.borrow().hash(HashInput::Entry(Entry::try_from(input)?)))? {
203 HashOutput::Entry(entry_hash) => Ok(entry_hash),
204 _ => unreachable!(),
205 }
206}
207
208/// Hash an `Action` into an `ActionHash`.
209///
210/// [`hash_entry`] has more of a discussion around different hash types and how
211/// they are used within the HDI.
212///
213/// It is strongly recommended to use [`hash_action`] to calculate the hash rather than hand rolling an in-wasm solution.
214/// Any inconsistencies in serialization or hash handling will result in dangling references to things due to a "corrupt" hash.
215///
216/// Note that usually relevant HDI functions return a [`ActionHashed`] or [`SignedActionHashed`] which already has associated methods to access the `ActionHash` of the inner `Action`.
217/// In normal usage it is unlikely to be required to separately hash a [`Action`] like this.
218pub fn hash_action(input: Action) -> ExternResult<ActionHash> {
219 match HDI.with(|h| h.borrow().hash(HashInput::Action(input)))? {
220 HashOutput::Action(action_hash) => Ok(action_hash),
221 _ => unreachable!(),
222 }
223}
224
225/// Hash arbitrary bytes using BLAKE2b.
226/// This is the same algorithm used by holochain for typed hashes.
227/// Notably the output hash length is configurable.
228pub fn hash_blake2b(input: Vec<u8>, output_len: u8) -> ExternResult<Vec<u8>> {
229 match HDI.with(|h| h.borrow().hash(HashInput::Blake2B(input, output_len)))? {
230 HashOutput::Blake2B(vec) => Ok(vec),
231 _ => unreachable!(),
232 }
233}
234
235/// @todo - not implemented on the host
236pub fn hash_sha256(input: Vec<u8>) -> ExternResult<Vec<u8>> {
237 match HDI.with(|h| h.borrow().hash(HashInput::Sha256(input)))? {
238 HashOutput::Sha256(hash) => Ok(hash.as_ref().to_vec()),
239 _ => unreachable!(),
240 }
241}
242
243/// @todo - not implemented on the host
244pub fn hash_sha512(input: Vec<u8>) -> ExternResult<Vec<u8>> {
245 match HDI.with(|h| h.borrow().hash(HashInput::Sha512(input)))? {
246 HashOutput::Sha512(hash) => Ok(hash.as_ref().to_vec()),
247 _ => unreachable!(),
248 }
249}
250
251/// Hash arbitrary bytes using keccak256.
252/// This is the same algorithm used by ethereum and other EVM compatible blockchains.
253/// It is essentially the same as sha3 256 but with a minor difference in configuration
254/// that is enough to generate different hash outputs.
255pub fn hash_keccak256(input: Vec<u8>) -> ExternResult<Vec<u8>> {
256 match HDI.with(|h| h.borrow().hash(HashInput::Keccak256(input)))? {
257 HashOutput::Keccak256(hash) => Ok(hash.as_ref().to_vec()),
258 _ => unreachable!(),
259 }
260}
261
262/// Hash arbitrary bytes using SHA3 256.
263/// This is the official NIST standard for 256 bit SHA3 hashes.
264pub fn hash_sha3(input: Vec<u8>) -> ExternResult<Vec<u8>> {
265 match HDI.with(|h| h.borrow().hash(HashInput::Sha3256(input)))? {
266 HashOutput::Sha3256(hash) => Ok(hash.as_ref().to_vec()),
267 _ => unreachable!(),
268 }
269}