iroh_bytes/protocol.rs
1//! Protocol for transferring content-addressed blobs and collections over quic
2//! connections. This can be used either with normal quic connections when using
3//! the [quinn](https://crates.io/crates/quinn) crate or with magicsock connections
4//! when using the [iroh-net](https://crates.io/crates/iroh-net) crate.
5//!
6//! # Participants
7//!
8//! The protocol is a request/response protocol with two parties, a *provider* that
9//! serves blobs and a *getter* that requests blobs.
10//!
11//! # Goals
12//!
13//! - Be paranoid about data integrity.
14//!
15//! Data integrity is considered more important than performance. Data will be
16//! validated both on the provider and getter side. A well behaved provider will
17//! never send invalid data. Responses to range requests contain sufficient
18//! information to validate the data.
19//!
20//! Note: Validation using blake3 is extremely fast, so in almost all scenarios the
21//! validation will not be the bottleneck even if we validate both on the provider
22//! and getter side.
23//!
24//! - Do not limit the size of blobs or collections.
25//!
26//! Blobs can be of arbitrary size, up to terabytes. Likewise, collections
27//! can contain an arbitrary number of links. A well behaved implementation will
28//! not require the entire blob or collection to be in memory at once.
29//!
30//! - Be efficient when transferring large blobs, including range requests.
31//!
32//! It is possible to request entire blobs or ranges of blobs, where the
33//! minimum granularity is a chunk group of 16KiB or 16 blake3 chunks. The worst
34//! case overhead when doing range requests is about two chunk groups per range.
35//!
36//! - Be efficient when transferring multiple tiny blobs.
37//!
38//! For tiny blobs the overhead of sending the blob hashes and the round-trip time
39//! for each blob would be prohibitive.
40//!
41//! To avoid roundtrips, the protocol allows grouping multiple blobs into *collections*.
42//! The semantic meaning of a collection is up to the application. For the purpose
43//! of this protocol, a collection is just a grouping of related blobs.
44//!
45//! # Non-goals
46//!
47//! - Do not attempt to be generic in terms of the used hash function.
48//!
49//! The protocol makes extensive use of the [blake3](https://crates.io/crates/blake3)
50//! hash function and it's special properties such as blake3 verified streaming.
51//!
52//! - Do not support graph traversal.
53//!
54//! The protocol only supports collections that directly contain blobs. If you have
55//! deeply nested graph data, you will need to either do multiple requests or flatten
56//! the graph into a single temporary collection.
57//!
58//! - Do not support discovery.
59//!
60//! The protocol does not yet have a discovery mechanism for asking the provider
61//! what ranges are available for a given blob. Currently you have to have some
62//! out-of-band knowledge about what node has data for a given hash, or you can
63//! just try to retrieve the data and see if it is available.
64//!
65//! A discovery protocol is planned in the future though.
66//!
67//! # Requests
68//!
69//! ## Getter defined requests
70//!
71//! In this case the getter knows the hash of the blob it wants to retrieve and
72//! whether it wants to retrieve a single blob or a collection.
73//!
74//! The getter needs to define exactly what it wants to retrieve and send the
75//! request to the provider.
76//!
77//! The provider will then respond with the bao encoded bytes for the requested
78//! data and then close the connection. It will immediately close the connection
79//! in case some data is not available or invalid.
80//!
81//! ## Provider defined requests
82//!
83//! In this case the getter sends a blob to the provider. This blob can contain
84//! some kind of query. The exact details of the query are up to the application.
85//!
86//! The provider evaluates the query and responds with a serialized request in
87//! the same format as the getter defined requests, followed by the bao encoded
88//! data. From then on the protocol is the same as for getter defined requests.
89//!
90//! ## Specifying the required data
91//!
92//! A [`GetRequest`] contains a hash and a specification of what data related to
93//! that hash is required. The specification is using a [`RangeSpecSeq`] which
94//! has a compact representation on the wire but is otherwise identical to a
95//! sequence of sets of ranges.
96//!
97//! In the following, we describe how the [`RangeSpecSeq`] is to be created for
98//! different common scenarios.
99//!
100//! Ranges are always given in terms of 1024 byte blake3 chunks, *not* in terms
101//! of bytes or chunk groups. The reason for this is that chunks are the fundamental
102//! unit of hashing in blake3. Addressing anything smaller than a chunk is not
103//! possible, and combining multiple chunks is merely an optimization to reduce
104//! metadata overhead.
105//!
106//! ### Individual blobs
107//!
108//! In the easiest case, the getter just wants to retrieve a single blob. In this
109//! case, the getter specifies [`RangeSpecSeq`] that contains a single element.
110//! This element is the set of all chunks to indicate that we
111//! want the entire blob, no matter how many chunks it has.
112//!
113//! Since this is a very common case, there is a convenience method
114//! [`GetRequest::single`] that only requires the hash of the blob.
115//!
116//! ```rust
117//! # use iroh_bytes::protocol::GetRequest;
118//! # let hash: iroh_bytes::Hash = [0; 32].into();
119//! let request = GetRequest::single(hash);
120//! ```
121//!
122//! ### Ranges of blobs
123//!
124//! In this case, we have a (possibly large) blob and we want to retrieve only
125//! some ranges of chunks. This is useful in similar cases as HTTP range requests.
126//!
127//! We still need just a single element in the [`RangeSpecSeq`], since we are
128//! still only interested in a single blob. However, this element contains all
129//! the chunk ranges we want to retrieve.
130//!
131//! For example, if we want to retrieve chunks 0-10 of a blob, we would
132//! create a [`RangeSpecSeq`] like this:
133//!
134//! ```rust
135//! # use bao_tree::{ChunkNum, ChunkRanges};
136//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
137//! # let hash: iroh_bytes::Hash = [0; 32].into();
138//! let spec = RangeSpecSeq::from_ranges([ChunkRanges::from(..ChunkNum(10))]);
139//! let request = GetRequest::new(hash, spec);
140//! ```
141//!
142//! Here `ChunkNum` is a newtype wrapper around `u64` that is used to indicate
143//! that we are talking about chunk numbers, not bytes.
144//!
145//! While not that common, it is also possible to request multiple ranges of a
146//! single blob. For example, if we want to retrieve chunks `0-10` and `100-110`
147//! of a large file, we would create a [`RangeSpecSeq`] like this:
148//!
149//! ```rust
150//! # use bao_tree::{ChunkNum, ChunkRanges};
151//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
152//! # let hash: iroh_bytes::Hash = [0; 32].into();
153//! let ranges = &ChunkRanges::from(..ChunkNum(10)) | &ChunkRanges::from(ChunkNum(100)..ChunkNum(110));
154//! let spec = RangeSpecSeq::from_ranges([ranges]);
155//! let request = GetRequest::new(hash, spec);
156//! ```
157//!
158//! To specify chunk ranges, we use the [`ChunkRanges`] type alias.
159//! This is actually the [`RangeSet`] type from the
160//! [range_collections](https://crates.io/crates/range_collections) crate. This
161//! type supports efficient boolean operations on sets of non-overlapping ranges.
162//!
163//! The [`RangeSet2`] type is a type alias for [`RangeSet`] that can store up to
164//! 2 boundaries without allocating. This is sufficient for most use cases.
165//!
166//! [`RangeSet`]: range_collections::range_set::RangeSet
167//! [`RangeSet2`]: range_collections::range_set::RangeSet2
168//!
169//! ### Collections
170//!
171//! In this case the provider has a collection that contains multiple blobs.
172//! We want to retrieve all blobs in the collection.
173//!
174//! When used for collections, the first element of a [`RangeSpecSeq`] refers
175//! to the collection itself, and all subsequent elements refer to the blobs
176//! in the collection. When a [`RangeSpecSeq`] specifies ranges for more than
177//! one blob, the provider will interpret this as a request for a collection.
178//!
179//! One thing to note is that we might not yet know how many blobs are in the
180//! collection. Therefore, it is not possible to download an entire collection
181//! by just specifying [`ChunkRanges::all()`] for all children.
182//!
183//! Instead, [`RangeSpecSeq`] allows defining infinite sequences of range sets.
184//! The [`RangeSpecSeq::all()`] method returns a [`RangeSpecSeq`] that, when iterated
185//! over, will yield [`ChunkRanges::all()`] forever.
186//!
187//! So specifying a collection would work like this:
188//!
189//! ```rust
190//! # use bao_tree::{ChunkNum, ChunkRanges};
191//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
192//! # let hash: iroh_bytes::Hash = [0; 32].into();
193//! let spec = RangeSpecSeq::all();
194//! let request = GetRequest::new(hash, spec);
195//! ```
196//!
197//! Downloading an entire collection is also a very common case, so there is a
198//! convenience method [`GetRequest::all`] that only requires the hash of the
199//! collection.
200//!
201//! ### Parts of collections
202//!
203//! The most complex common case is when we have retrieved a collection and
204//! it's children, but were interrupted before we could retrieve all children.
205//!
206//! In this case we need to specify the collection we want to retrieve, but
207//! exclude the children and parts of children that we already have.
208//!
209//! For example, if we have a collection with 3 children, and we already have
210//! the first child and the first 1000000 chunks of the second child.
211//!
212//! We would create a [`GetRequest`] like this:
213//!
214//! ```rust
215//! # use bao_tree::{ChunkNum, ChunkRanges};
216//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
217//! # let hash: iroh_bytes::Hash = [0; 32].into();
218//! let spec = RangeSpecSeq::from_ranges([
219//! ChunkRanges::empty(), // we don't need the collection itself
220//! ChunkRanges::empty(), // we don't need the first child either
221//! ChunkRanges::from(ChunkNum(1000000)..), // we need the second child from chunk 1000000 onwards
222//! ChunkRanges::all(), // we need the third child completely
223//! ]);
224//! let request = GetRequest::new(hash, spec);
225//! ```
226//!
227//! ### Requesting chunks for each child
228//!
229//! The RangeSpecSeq allows some scenarios that are not covered above. E.g. you
230//! might want to request a collection and the first chunk of each child blob to
231//! do something like mime type detection.
232//!
233//! You do not know how many children the collection has, so you need to use
234//! an infinite sequence.
235//!
236//! ```rust
237//! # use bao_tree::{ChunkNum, ChunkRanges};
238//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
239//! # let hash: iroh_bytes::Hash = [0; 32].into();
240//! let spec = RangeSpecSeq::from_ranges_infinite([
241//! ChunkRanges::all(), // the collection itself
242//! ChunkRanges::from(..ChunkNum(1)), // the first chunk of each child
243//! ]);
244//! let request = GetRequest::new(hash, spec);
245//! ```
246//!
247//! ### Requesting a single child
248//!
249//! It is of course possible to request a single child of a collection. E.g.
250//! the following would download the second child of a collection:
251//!
252//! ```rust
253//! # use bao_tree::{ChunkNum, ChunkRanges};
254//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
255//! # let hash: iroh_bytes::Hash = [0; 32].into();
256//! let spec = RangeSpecSeq::from_ranges([
257//! ChunkRanges::empty(), // we don't need the collection itself
258//! ChunkRanges::empty(), // we don't need the first child either
259//! ChunkRanges::all(), // we need the second child completely
260//! ]);
261//! let request = GetRequest::new(hash, spec);
262//! ```
263//!
264//! However, if you already have the collection, you might as well locally
265//! look up the hash of the child and request it directly.
266//!
267//! ```rust
268//! # use bao_tree::{ChunkNum, ChunkRanges};
269//! # use iroh_bytes::protocol::{GetRequest, RangeSpecSeq};
270//! # let child_hash: iroh_bytes::Hash = [0; 32].into();
271//! let request = GetRequest::single(child_hash);
272//! ```
273//!
274//! ### Why RangeSpec and RangeSpecSeq?
275//!
276//! You might wonder why we have [`RangeSpec`] and [`RangeSpecSeq`], when a simple
277//! sequence of [`ChunkRanges`] might also do.
278//!
279//! The [`RangeSpec`] and [`RangeSpecSeq`] types exist to provide an efficient
280//! representation of the request on the wire. In the [`RangeSpec`] type,
281//! sequences of ranges are encoded alternating intervals of selected and
282//! non-selected chunks. This results in smaller numbers that will result in fewer bytes
283//! on the wire when using the [postcard](https://crates.io/crates/postcard) encoding
284//! format that uses variable length integers.
285//!
286//! Likewise, the [`RangeSpecSeq`] type is a sequence of [`RangeSpec`]s that
287//! does run length encoding to remove repeating elements. It also allows infinite
288//! sequences of [`RangeSpec`]s to be encoded, unlike a simple sequence of
289//! [`ChunkRanges`]s.
290//!
291//! [`RangeSpecSeq`] should be efficient even in case of very fragmented availability
292//! of chunks, like a download from multiple providers that was frequently interrupted.
293//!
294//! # Responses
295//!
296//! The response stream contains the bao encoded bytes for the requested data.
297//! The data will be sent in the order in which it was requested, so ascending
298//! chunks for each blob, and blobs in the order in which they appear in the
299//! collection.
300//!
301//! For details on the bao encoding, see the [bao specification](https://github.com/oconnor663/bao/blob/master/docs/spec.md)
302//! and the [bao-tree](https://crates.io/crates/bao-tree) crate. The bao-tree crate
303//! is identical to the bao crate, except that it allows combining multiple blake3
304//! chunks to chunk groups for efficiency.
305//!
306//! As a consequence of the chunk group optimization, chunk ranges in the response
307//! will be rounded up to chunk groups ranges, so e.g. if you ask for chunks 0..10,
308//! you will get chunks 0-16. This is done to reduce metadata overhead, and might
309//! change in the future.
310//!
311//! For a complete response, the chunks are guaranteed to completely cover the
312//! requested ranges.
313//!
314//! Reasons for not retrieving a complete response are two-fold:
315//!
316//! - the connection to the provider was interrupted, or the provider encountered
317//! an internal error. In this case the provider will close the entire quinn connection.
318//!
319//! - the provider does not have the requested data, or discovered on send that the
320//! requested data is not valid.
321//!
322//! In this case the provider will close just the stream used to send the response.
323//! The exact location of the missing data can be retrieved from the error.
324//!
325//! # Requesting multiple unrelated blobs
326//!
327//! Currently, the protocol does not support requesting multiple unrelated blobs
328//! in a single request. As an alternative, you can create a collection
329//! on the provider side and use that to efficiently retrieve the blobs.
330//!
331//! If that is not possible, you can create a custom request handler that
332//! accepts a custom request struct that contains the hashes of the blobs.
333//!
334//! If neither of these options are possible, you have no choice but to do
335//! multiple requests. However, note that multiple requests will be multiplexed
336//! over a single connection, and the overhead of a new QUIC stream on an existing
337//! connection is very low.
338//!
339//! In case nodes are permanently exchanging data, it is probably valuable to
340//! keep a connection open and reuse it for multiple requests.
341use bao_tree::{ChunkNum, ChunkRanges};
342use derive_more::From;
343use quinn::VarInt;
344use serde::{Deserialize, Serialize};
345mod range_spec;
346pub use range_spec::{NonEmptyRequestRangeSpecIter, RangeSpec, RangeSpecSeq};
347
348use crate::Hash;
349
350/// Maximum message size is limited to 100MiB for now.
351pub const MAX_MESSAGE_SIZE: usize = 1024 * 1024 * 100;
352
353/// The ALPN used with quic for the iroh bytes protocol.
354pub const ALPN: &[u8] = b"/iroh-bytes/4";
355
356#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, From)]
357/// A request to the provider
358pub enum Request {
359 /// A get request for a blob or collection
360 Get(GetRequest),
361}
362
363/// A request
364#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone)]
365pub struct GetRequest {
366 /// blake3 hash
367 pub hash: Hash,
368 /// The range of data to request
369 ///
370 /// The first element is the parent, all subsequent elements are children.
371 pub ranges: RangeSpecSeq,
372}
373
374impl GetRequest {
375 /// Request a blob or collection with specified ranges
376 pub fn new(hash: Hash, ranges: RangeSpecSeq) -> Self {
377 Self { hash, ranges }
378 }
379
380 /// Request a collection and all its children
381 pub fn all(hash: Hash) -> Self {
382 Self {
383 hash,
384 ranges: RangeSpecSeq::all(),
385 }
386 }
387
388 /// Request just a single blob
389 pub fn single(hash: Hash) -> Self {
390 Self {
391 hash,
392 ranges: RangeSpecSeq::from_ranges([ChunkRanges::all()]),
393 }
394 }
395
396 /// Request the last chunk of a single blob
397 ///
398 /// This can be used to get the verified size of a blob.
399 pub fn last_chunk(hash: Hash) -> Self {
400 Self {
401 hash,
402 ranges: RangeSpecSeq::from_ranges([ChunkRanges::from(ChunkNum(u64::MAX)..)]),
403 }
404 }
405
406 /// Request the last chunk for all children
407 ///
408 /// This can be used to get the verified size of all children.
409 pub fn last_chunks(hash: Hash) -> Self {
410 Self {
411 hash,
412 ranges: RangeSpecSeq::from_ranges_infinite([
413 ChunkRanges::all(),
414 ChunkRanges::from(ChunkNum(u64::MAX)..),
415 ]),
416 }
417 }
418}
419
420/// Reasons to close connections or stop streams.
421///
422/// A QUIC **connection** can be *closed* and a **stream** can request the other side to
423/// *stop* sending data. Both closing and stopping have an associated `error_code`, closing
424/// also adds a `reason` as some arbitrary bytes.
425///
426/// This enum exists so we have a single namespace for `error_code`s used.
427#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
428#[repr(u16)]
429pub enum Closed {
430 /// The [`quinn::RecvStream`] was dropped.
431 ///
432 /// Used implicitly when a [`quinn::RecvStream`] is dropped without explicit call to
433 /// [`quinn::RecvStream::stop`]. We don't use this explicitly but this is here as
434 /// documentation as to what happened to `0`.
435 StreamDropped = 0,
436 /// The provider is terminating.
437 ///
438 /// When a provider terminates all connections and associated streams are closed.
439 ProviderTerminating = 1,
440 /// The provider has received the request.
441 ///
442 /// Only a single request is allowed on a stream, if more data is received after this a
443 /// provider may send this error code in a STOP_STREAM frame.
444 RequestReceived = 2,
445}
446
447impl Closed {
448 /// The close reason as bytes. This is a valid utf8 string describing the reason.
449 pub fn reason(&self) -> &'static [u8] {
450 match self {
451 Closed::StreamDropped => b"stream dropped",
452 Closed::ProviderTerminating => b"provider terminating",
453 Closed::RequestReceived => b"request received",
454 }
455 }
456}
457
458impl From<Closed> for VarInt {
459 fn from(source: Closed) -> Self {
460 VarInt::from(source as u16)
461 }
462}
463
464/// Unknown error_code, can not be converted into [`Closed`].
465#[derive(thiserror::Error, Debug)]
466#[error("Unknown error_code: {0}")]
467pub struct UnknownErrorCode(u64);
468
469impl TryFrom<VarInt> for Closed {
470 type Error = UnknownErrorCode;
471
472 fn try_from(value: VarInt) -> std::result::Result<Self, Self::Error> {
473 match value.into_inner() {
474 0 => Ok(Self::StreamDropped),
475 1 => Ok(Self::ProviderTerminating),
476 2 => Ok(Self::RequestReceived),
477 val => Err(UnknownErrorCode(val)),
478 }
479 }
480}
481
482#[cfg(test)]
483mod tests {
484 use iroh_test::{assert_eq_hex, hexdump::parse_hexdump};
485
486 use super::{GetRequest, Request};
487
488 #[test]
489 fn request_wire_format() {
490 let hash = [0xda; 32].into();
491 let cases = [
492 (
493 Request::from(GetRequest::single(hash)),
494 r"
495 00 # enum variant for GetRequest
496 dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
497 020001000100 # the RangeSpecSeq
498 ",
499 ),
500 (
501 Request::from(GetRequest::all(hash)),
502 r"
503 00 # enum variant for GetRequest
504 dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
505 01000100 # the RangeSpecSeq
506 ",
507 ),
508 ];
509 for (case, expected_hex) in cases {
510 let expected = parse_hexdump(expected_hex).unwrap();
511 let bytes = postcard::to_stdvec(&case).unwrap();
512 assert_eq_hex!(bytes, expected);
513 }
514 }
515}