iroh_blobs/
protocol.rs

1//! Protocol for transferring content-addressed blobs and collections over quic
2//! connections. This can be used either with normal quic connections when using
3//! the [quinn](https://crates.io/crates/quinn) crate or with magicsock connections
4//! when using the [iroh-net](https://crates.io/crates/iroh-net) crate.
5//!
6//! # Participants
7//!
8//! The protocol is a request/response protocol with two parties, a *provider* that
9//! serves blobs and a *getter* that requests blobs.
10//!
11//! # Goals
12//!
13//! - Be paranoid about data integrity.
14//!
15//!   Data integrity is considered more important than performance. Data will be validated both on
16//!   the provider and getter side. A well behaved provider will never send invalid data. Responses
17//!   to range requests contain sufficient information to validate the data.
18//!
19//!   Note: Validation using blake3 is extremely fast, so in almost all scenarios the validation
20//!   will not be the bottleneck even if we validate both on the provider and getter side.
21//!
22//! - Do not limit the size of blobs or collections.
23//!
24//!   Blobs can be of arbitrary size, up to terabytes. Likewise, collections can contain an
25//!   arbitrary number of links. A well behaved implementation will not require the entire blob or
26//!   collection to be in memory at once.
27//!
28//! - Be efficient when transferring large blobs, including range requests.
29//!
30//!   It is possible to request entire blobs or ranges of blobs, where the minimum granularity is a
31//!   chunk group of 16KiB or 16 blake3 chunks. The worst case overhead when doing range requests
32//!   is about two chunk groups per range.
33//!
34//! - Be efficient when transferring multiple tiny blobs.
35//!
36//!   For tiny blobs the overhead of sending the blob hashes and the round-trip time for each blob
37//!   would be prohibitive.
38//!
39//! To avoid roundtrips, the protocol allows grouping multiple blobs into *collections*.
40//! The semantic meaning of a collection is up to the application. For the purpose
41//! of this protocol, a collection is just a grouping of related blobs.
42//!
43//! # Non-goals
44//!
45//! - Do not attempt to be generic in terms of the used hash function.
46//!
47//!   The protocol makes extensive use of the [blake3](https://crates.io/crates/blake3) hash
48//!   function and it's special properties such as blake3 verified streaming.
49//!
50//! - Do not support graph traversal.
51//!
52//!   The protocol only supports collections that directly contain blobs. If you have deeply nested
53//!   graph data, you will need to either do multiple requests or flatten the graph into a single
54//!   temporary collection.
55//!
56//! - Do not support discovery.
57//!
58//!   The protocol does not yet have a discovery mechanism for asking the provider what ranges are
59//!   available for a given blob. Currently you have to have some out-of-band knowledge about what
60//!   node has data for a given hash, or you can just try to retrieve the data and see if it is
61//!   available.
62//!
63//! A discovery protocol is planned in the future though.
64//!
65//! # Requests
66//!
67//! ## Getter defined requests
68//!
69//! In this case the getter knows the hash of the blob it wants to retrieve and
70//! whether it wants to retrieve a single blob or a collection.
71//!
72//! The getter needs to define exactly what it wants to retrieve and send the
73//! request to the provider.
74//!
75//! The provider will then respond with the bao encoded bytes for the requested
76//! data and then close the connection. It will immediately close the connection
77//! in case some data is not available or invalid.
78//!
79//! ## Provider defined requests
80//!
81//! In this case the getter sends a blob to the provider. This blob can contain
82//! some kind of query. The exact details of the query are up to the application.
83//!
84//! The provider evaluates the query and responds with a serialized request in
85//! the same format as the getter defined requests, followed by the bao encoded
86//! data. From then on the protocol is the same as for getter defined requests.
87//!
88//! ## Specifying the required data
89//!
90//! A [`GetRequest`] contains a hash and a specification of what data related to
91//! that hash is required. The specification is using a [`RangeSpecSeq`] which
92//! has a compact representation on the wire but is otherwise identical to a
93//! sequence of sets of ranges.
94//!
95//! In the following, we describe how the [`RangeSpecSeq`] is to be created for
96//! different common scenarios.
97//!
98//! Ranges are always given in terms of 1024 byte blake3 chunks, *not* in terms
99//! of bytes or chunk groups. The reason for this is that chunks are the fundamental
100//! unit of hashing in blake3. Addressing anything smaller than a chunk is not
101//! possible, and combining multiple chunks is merely an optimization to reduce
102//! metadata overhead.
103//!
104//! ### Individual blobs
105//!
106//! In the easiest case, the getter just wants to retrieve a single blob. In this
107//! case, the getter specifies [`RangeSpecSeq`] that contains a single element.
108//! This element is the set of all chunks to indicate that we
109//! want the entire blob, no matter how many chunks it has.
110//!
111//! Since this is a very common case, there is a convenience method
112//! [`GetRequest::single`] that only requires the hash of the blob.
113//!
114//! ```rust
115//! # use iroh_blobs::protocol::GetRequest;
116//! # let hash: iroh_blobs::Hash = [0; 32].into();
117//! let request = GetRequest::single(hash);
118//! ```
119//!
120//! ### Ranges of blobs
121//!
122//! In this case, we have a (possibly large) blob and we want to retrieve only
123//! some ranges of chunks. This is useful in similar cases as HTTP range requests.
124//!
125//! We still need just a single element in the [`RangeSpecSeq`], since we are
126//! still only interested in a single blob. However, this element contains all
127//! the chunk ranges we want to retrieve.
128//!
129//! For example, if we want to retrieve chunks 0-10 of a blob, we would
130//! create a [`RangeSpecSeq`] like this:
131//!
132//! ```rust
133//! # use bao_tree::{ChunkNum, ChunkRanges};
134//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
135//! # let hash: iroh_blobs::Hash = [0; 32].into();
136//! let spec = RangeSpecSeq::from_ranges([ChunkRanges::from(..ChunkNum(10))]);
137//! let request = GetRequest::new(hash, spec);
138//! ```
139//!
140//! Here `ChunkNum` is a newtype wrapper around `u64` that is used to indicate
141//! that we are talking about chunk numbers, not bytes.
142//!
143//! While not that common, it is also possible to request multiple ranges of a
144//! single blob. For example, if we want to retrieve chunks `0-10` and `100-110`
145//! of a large file, we would create a [`RangeSpecSeq`] like this:
146//!
147//! ```rust
148//! # use bao_tree::{ChunkNum, ChunkRanges};
149//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
150//! # let hash: iroh_blobs::Hash = [0; 32].into();
151//! let ranges =
152//!     &ChunkRanges::from(..ChunkNum(10)) | &ChunkRanges::from(ChunkNum(100)..ChunkNum(110));
153//! let spec = RangeSpecSeq::from_ranges([ranges]);
154//! let request = GetRequest::new(hash, spec);
155//! ```
156//!
157//! To specify chunk ranges, we use the [`ChunkRanges`] type alias.
158//! This is actually the [`RangeSet`] type from the
159//! [range_collections](https://crates.io/crates/range_collections) crate. This
160//! type supports efficient boolean operations on sets of non-overlapping ranges.
161//!
162//! The [`RangeSet2`] type is a type alias for [`RangeSet`] that can store up to
163//! 2 boundaries without allocating. This is sufficient for most use cases.
164//!
165//! [`RangeSet`]: range_collections::range_set::RangeSet
166//! [`RangeSet2`]: range_collections::range_set::RangeSet2
167//!
168//! ### Collections
169//!
170//! In this case the provider has a collection that contains multiple blobs.
171//! We want to retrieve all blobs in the collection.
172//!
173//! When used for collections, the first element of a [`RangeSpecSeq`] refers
174//! to the collection itself, and all subsequent elements refer to the blobs
175//! in the collection. When a [`RangeSpecSeq`] specifies ranges for more than
176//! one blob, the provider will interpret this as a request for a collection.
177//!
178//! One thing to note is that we might not yet know how many blobs are in the
179//! collection. Therefore, it is not possible to download an entire collection
180//! by just specifying [`ChunkRanges::all()`] for all children.
181//!
182//! Instead, [`RangeSpecSeq`] allows defining infinite sequences of range sets.
183//! The [`RangeSpecSeq::all()`] method returns a [`RangeSpecSeq`] that, when iterated
184//! over, will yield [`ChunkRanges::all()`] forever.
185//!
186//! So specifying a collection would work like this:
187//!
188//! ```rust
189//! # use bao_tree::{ChunkNum, ChunkRanges};
190//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
191//! # let hash: iroh_blobs::Hash = [0; 32].into();
192//! let spec = RangeSpecSeq::all();
193//! let request = GetRequest::new(hash, spec);
194//! ```
195//!
196//! Downloading an entire collection is also a very common case, so there is a
197//! convenience method [`GetRequest::all`] that only requires the hash of the
198//! collection.
199//!
200//! ### Parts of collections
201//!
202//! The most complex common case is when we have retrieved a collection and
203//! it's children, but were interrupted before we could retrieve all children.
204//!
205//! In this case we need to specify the collection we want to retrieve, but
206//! exclude the children and parts of children that we already have.
207//!
208//! For example, if we have a collection with 3 children, and we already have
209//! the first child and the first 1000000 chunks of the second child.
210//!
211//! We would create a [`GetRequest`] like this:
212//!
213//! ```rust
214//! # use bao_tree::{ChunkNum, ChunkRanges};
215//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
216//! # let hash: iroh_blobs::Hash = [0; 32].into();
217//! let spec = RangeSpecSeq::from_ranges([
218//!   ChunkRanges::empty(), // we don't need the collection itself
219//!   ChunkRanges::empty(), // we don't need the first child either
220//!   ChunkRanges::from(ChunkNum(1000000)..), // we need the second child from chunk 1000000 onwards
221//!   ChunkRanges::all(), // we need the third child completely
222//! ]);
223//! let request = GetRequest::new(hash, spec);
224//! ```
225//!
226//! ### Requesting chunks for each child
227//!
228//! The RangeSpecSeq allows some scenarios that are not covered above. E.g. you
229//! might want to request a collection and the first chunk of each child blob to
230//! do something like mime type detection.
231//!
232//! You do not know how many children the collection has, so you need to use
233//! an infinite sequence.
234//!
235//! ```rust
236//! # use bao_tree::{ChunkNum, ChunkRanges};
237//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
238//! # let hash: iroh_blobs::Hash = [0; 32].into();
239//! let spec = RangeSpecSeq::from_ranges_infinite([
240//!     ChunkRanges::all(),               // the collection itself
241//!     ChunkRanges::from(..ChunkNum(1)), // the first chunk of each child
242//! ]);
243//! let request = GetRequest::new(hash, spec);
244//! ```
245//!
246//! ### Requesting a single child
247//!
248//! It is of course possible to request a single child of a collection. E.g.
249//! the following would download the second child of a collection:
250//!
251//! ```rust
252//! # use bao_tree::{ChunkNum, ChunkRanges};
253//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
254//! # let hash: iroh_blobs::Hash = [0; 32].into();
255//! let spec = RangeSpecSeq::from_ranges([
256//!     ChunkRanges::empty(), // we don't need the collection itself
257//!     ChunkRanges::empty(), // we don't need the first child either
258//!     ChunkRanges::all(),   // we need the second child completely
259//! ]);
260//! let request = GetRequest::new(hash, spec);
261//! ```
262//!
263//! However, if you already have the collection, you might as well locally
264//! look up the hash of the child and request it directly.
265//!
266//! ```rust
267//! # use bao_tree::{ChunkNum, ChunkRanges};
268//! # use iroh_blobs::protocol::{GetRequest, RangeSpecSeq};
269//! # let child_hash: iroh_blobs::Hash = [0; 32].into();
270//! let request = GetRequest::single(child_hash);
271//! ```
272//!
273//! ### Why RangeSpec and RangeSpecSeq?
274//!
275//! You might wonder why we have [`RangeSpec`] and [`RangeSpecSeq`], when a simple
276//! sequence of [`ChunkRanges`] might also do.
277//!
278//! The [`RangeSpec`] and [`RangeSpecSeq`] types exist to provide an efficient
279//! representation of the request on the wire. In the [`RangeSpec`] type,
280//! sequences of ranges are encoded alternating intervals of selected and
281//! non-selected chunks. This results in smaller numbers that will result in fewer bytes
282//! on the wire when using the [postcard](https://crates.io/crates/postcard) encoding
283//! format that uses variable length integers.
284//!
285//! Likewise, the [`RangeSpecSeq`] type is a sequence of [`RangeSpec`]s that
286//! does run length encoding to remove repeating elements. It also allows infinite
287//! sequences of [`RangeSpec`]s to be encoded, unlike a simple sequence of
288//! [`ChunkRanges`]s.
289//!
290//! [`RangeSpecSeq`] should be efficient even in case of very fragmented availability
291//! of chunks, like a download from multiple providers that was frequently interrupted.
292//!
293//! # Responses
294//!
295//! The response stream contains the bao encoded bytes for the requested data.
296//! The data will be sent in the order in which it was requested, so ascending
297//! chunks for each blob, and blobs in the order in which they appear in the
298//! collection.
299//!
300//! For details on the bao encoding, see the [bao specification](https://github.com/oconnor663/bao/blob/master/docs/spec.md)
301//! and the [bao-tree](https://crates.io/crates/bao-tree) crate. The bao-tree crate
302//! is identical to the bao crate, except that it allows combining multiple blake3
303//! chunks to chunk groups for efficiency.
304//!
305//! As a consequence of the chunk group optimization, chunk ranges in the response
306//! will be rounded up to chunk groups ranges, so e.g. if you ask for chunks 0..10,
307//! you will get chunks 0-16. This is done to reduce metadata overhead, and might
308//! change in the future.
309//!
310//! For a complete response, the chunks are guaranteed to completely cover the
311//! requested ranges.
312//!
313//! Reasons for not retrieving a complete response are two-fold:
314//!
315//! - the connection to the provider was interrupted, or the provider encountered
316//!   an internal error. In this case the provider will close the entire quinn connection.
317//!
318//! - the provider does not have the requested data, or discovered on send that the
319//!   requested data is not valid.
320//!
321//! In this case the provider will close just the stream used to send the response.
322//! The exact location of the missing data can be retrieved from the error.
323//!
324//! # Requesting multiple unrelated blobs
325//!
326//! Currently, the protocol does not support requesting multiple unrelated blobs
327//! in a single request. As an alternative, you can create a collection
328//! on the provider side and use that to efficiently retrieve the blobs.
329//!
330//! If that is not possible, you can create a custom request handler that
331//! accepts a custom request struct that contains the hashes of the blobs.
332//!
333//! If neither of these options are possible, you have no choice but to do
334//! multiple requests. However, note that multiple requests will be multiplexed
335//! over a single connection, and the overhead of a new QUIC stream on an existing
336//! connection is very low.
337//!
338//! In case nodes are permanently exchanging data, it is probably valuable to
339//! keep a connection open and reuse it for multiple requests.
340use bao_tree::{ChunkNum, ChunkRanges};
341use derive_more::From;
342use iroh::endpoint::VarInt;
343use serde::{Deserialize, Serialize};
344mod range_spec;
345pub use range_spec::{NonEmptyRequestRangeSpecIter, RangeSpec, RangeSpecSeq};
346
347use crate::Hash;
348
349/// Maximum message size is limited to 100MiB for now.
350pub const MAX_MESSAGE_SIZE: usize = 1024 * 1024 * 100;
351
352/// The ALPN used with quic for the iroh bytes protocol.
353pub const ALPN: &[u8] = b"/iroh-bytes/4";
354
355#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, From)]
356/// A request to the provider
357pub enum Request {
358    /// A get request for a blob or collection
359    Get(GetRequest),
360}
361
362/// A request
363#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone)]
364pub struct GetRequest {
365    /// blake3 hash
366    pub hash: Hash,
367    /// The range of data to request
368    ///
369    /// The first element is the parent, all subsequent elements are children.
370    pub ranges: RangeSpecSeq,
371}
372
373impl GetRequest {
374    /// Request a blob or collection with specified ranges
375    pub fn new(hash: Hash, ranges: RangeSpecSeq) -> Self {
376        Self { hash, ranges }
377    }
378
379    /// Request a collection and all its children
380    pub fn all(hash: Hash) -> Self {
381        Self {
382            hash,
383            ranges: RangeSpecSeq::all(),
384        }
385    }
386
387    /// Request just a single blob
388    pub fn single(hash: Hash) -> Self {
389        Self {
390            hash,
391            ranges: RangeSpecSeq::from_ranges([ChunkRanges::all()]),
392        }
393    }
394
395    /// Request the last chunk of a single blob
396    ///
397    /// This can be used to get the verified size of a blob.
398    pub fn last_chunk(hash: Hash) -> Self {
399        Self {
400            hash,
401            ranges: RangeSpecSeq::from_ranges([ChunkRanges::from(ChunkNum(u64::MAX)..)]),
402        }
403    }
404
405    /// Request the last chunk for all children
406    ///
407    /// This can be used to get the verified size of all children.
408    pub fn last_chunks(hash: Hash) -> Self {
409        Self {
410            hash,
411            ranges: RangeSpecSeq::from_ranges_infinite([
412                ChunkRanges::all(),
413                ChunkRanges::from(ChunkNum(u64::MAX)..),
414            ]),
415        }
416    }
417}
418
419/// Reasons to close connections or stop streams.
420///
421/// A QUIC **connection** can be *closed* and a **stream** can request the other side to
422/// *stop* sending data.  Both closing and stopping have an associated `error_code`, closing
423/// also adds a `reason` as some arbitrary bytes.
424///
425/// This enum exists so we have a single namespace for `error_code`s used.
426#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
427#[repr(u16)]
428pub enum Closed {
429    /// The [`RecvStream`] was dropped.
430    ///
431    /// Used implicitly when a [`RecvStream`] is dropped without explicit call to
432    /// [`RecvStream::stop`].  We don't use this explicitly but this is here as
433    /// documentation as to what happened to `0`.
434    ///
435    /// [`RecvStream`]: iroh::endpoint::RecvStream
436    /// [`RecvStream::stop`]: iroh::endpoint::RecvStream::stop
437    StreamDropped = 0,
438    /// The provider is terminating.
439    ///
440    /// When a provider terminates all connections and associated streams are closed.
441    ProviderTerminating = 1,
442    /// The provider has received the request.
443    ///
444    /// Only a single request is allowed on a stream, if more data is received after this a
445    /// provider may send this error code in a STOP_STREAM frame.
446    RequestReceived = 2,
447}
448
449impl Closed {
450    /// The close reason as bytes. This is a valid utf8 string describing the reason.
451    pub fn reason(&self) -> &'static [u8] {
452        match self {
453            Closed::StreamDropped => b"stream dropped",
454            Closed::ProviderTerminating => b"provider terminating",
455            Closed::RequestReceived => b"request received",
456        }
457    }
458}
459
460impl From<Closed> for VarInt {
461    fn from(source: Closed) -> Self {
462        VarInt::from(source as u16)
463    }
464}
465
466/// Unknown error_code, can not be converted into [`Closed`].
467#[derive(thiserror::Error, Debug)]
468#[error("Unknown error_code: {0}")]
469pub struct UnknownErrorCode(u64);
470
471impl TryFrom<VarInt> for Closed {
472    type Error = UnknownErrorCode;
473
474    fn try_from(value: VarInt) -> std::result::Result<Self, Self::Error> {
475        match value.into_inner() {
476            0 => Ok(Self::StreamDropped),
477            1 => Ok(Self::ProviderTerminating),
478            2 => Ok(Self::RequestReceived),
479            val => Err(UnknownErrorCode(val)),
480        }
481    }
482}
483
484#[cfg(test)]
485mod tests {
486    use super::{GetRequest, Request};
487    use crate::{assert_eq_hex, util::hexdump::parse_hexdump};
488
489    #[test]
490    fn request_wire_format() {
491        let hash = [0xda; 32].into();
492        let cases = [
493            (
494                Request::from(GetRequest::single(hash)),
495                r"
496                    00 # enum variant for GetRequest
497                    dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
498                    020001000100 # the RangeSpecSeq
499            ",
500            ),
501            (
502                Request::from(GetRequest::all(hash)),
503                r"
504                    00 # enum variant for GetRequest
505                    dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
506                    01000100 # the RangeSpecSeq
507            ",
508            ),
509        ];
510        for (case, expected_hex) in cases {
511            let expected = parse_hexdump(expected_hex).unwrap();
512            let bytes = postcard::to_stdvec(&case).unwrap();
513            assert_eq_hex!(bytes, expected);
514        }
515    }
516}