iroh_blobs/
protocol.rs

1//! Protocol for transferring content-addressed blobs over [`iroh`] p2p QUIC connections.
2//!
3//! # Participants
4//!
5//! The protocol is a request/response protocol with two parties, a *provider* that
6//! serves blobs and a *getter* that requests blobs.
7//!
8//! # Goals
9//!
10//! - Be paranoid about data integrity.
11//!
12//!   Data integrity is considered more important than performance. Data will be validated both on
13//!   the provider and getter side. A well behaved provider will never send invalid data. Responses
14//!   to range requests contain sufficient information to validate the data.
15//!
16//!   Note: Validation using blake3 is extremely fast, so in almost all scenarios the validation
17//!   will not be the bottleneck even if we validate both on the provider and getter side.
18//!
19//! - Do not limit the size of blobs or collections.
20//!
21//!   Blobs can be of arbitrary size, up to terabytes. Likewise, collections can contain an
22//!   arbitrary number of links. A well behaved implementation will not require the entire blob or
23//!   collection to be in memory at once.
24//!
25//! - Be efficient when transferring large blobs, including range requests.
26//!
27//!   It is possible to request entire blobs or ranges of blobs, where the minimum granularity is a
28//!   chunk group of 16KiB or 16 blake3 chunks. The worst case overhead when doing range requests
29//!   is about two chunk groups per range.
30//!
31//! - Be efficient when transferring multiple tiny blobs.
32//!
33//!   For tiny blobs the overhead of sending the blob hashes and the round-trip time for each blob
34//!   would be prohibitive.
35//!
36//! To avoid roundtrips, the protocol allows grouping multiple blobs into *collections*.
37//! The semantic meaning of a collection is up to the application. For the purpose
38//! of this protocol, a collection is just a grouping of related blobs.
39//!
40//! # Non-goals
41//!
42//! - Do not attempt to be generic in terms of the used hash function.
43//!
44//!   The protocol makes extensive use of the [blake3](https://crates.io/crates/blake3) hash
45//!   function and it's special properties such as blake3 verified streaming.
46//!
47//! - Do not support graph traversal.
48//!
49//!   The protocol only supports collections that directly contain blobs. If you have deeply nested
50//!   graph data, you will need to either do multiple requests or flatten the graph into a single
51//!   temporary collection.
52//!
53//! - Do not support discovery.
54//!
55//!   The protocol does not yet have a discovery mechanism for asking the provider what ranges are
56//!   available for a given blob. Currently you have to have some out-of-band knowledge about what
57//!   node has data for a given hash, or you can just try to retrieve the data and see if it is
58//!   available.
59//!
60//! A discovery protocol is planned in the future though.
61//!
62//! # Requests
63//!
64//! ## Getter defined requests
65//!
66//! In this case the getter knows the hash of the blob it wants to retrieve and
67//! whether it wants to retrieve a single blob or a collection.
68//!
69//! The getter needs to define exactly what it wants to retrieve and send the
70//! request to the provider.
71//!
72//! The provider will then respond with the bao encoded bytes for the requested
73//! data and then close the connection. It will immediately close the connection
74//! in case some data is not available or invalid.
75//!
76//! ## Provider defined requests
77//!
78//! In this case the getter sends a blob to the provider. This blob can contain
79//! some kind of query. The exact details of the query are up to the application.
80//!
81//! The provider evaluates the query and responds with a serialized request in
82//! the same format as the getter defined requests, followed by the bao encoded
83//! data. From then on the protocol is the same as for getter defined requests.
84//!
85//! ## Specifying the required data
86//!
87//! A [`GetRequest`] contains a hash and a specification of what data related to
88//! that hash is required. The specification is using a [`ChunkRangesSeq`] which
89//! has a compact representation on the wire but is otherwise identical to a
90//! sequence of sets of ranges.
91//!
92//! In the following, we describe how the [`GetRequest`] is to be created for
93//! different common scenarios.
94//!
95//! Under the hood, this is using the [`ChunkRangesSeq`] type, but the most
96//! convenient way to create a [`GetRequest`] is to use the builder API.
97//!
98//! Ranges are always given in terms of 1024 byte blake3 chunks, *not* in terms
99//! of bytes or chunk groups. The reason for this is that chunks are the fundamental
100//! unit of hashing in BLAKE3. Addressing anything smaller than a chunk is not
101//! possible, and combining multiple chunks is merely an optimization to reduce
102//! metadata overhead.
103//!
104//! ### Individual blobs
105//!
106//! In the easiest case, the getter just wants to retrieve a single blob. In this
107//! case, the getter specifies [`ChunkRangesSeq`] that contains a single element.
108//! This element is the set of all chunks to indicate that we
109//! want the entire blob, no matter how many chunks it has.
110//!
111//! Since this is a very common case, there is a convenience method
112//! [`GetRequest::blob`] that only requires the hash of the blob.
113//!
114//! ```rust
115//! # use iroh_blobs::protocol::GetRequest;
116//! # let hash: iroh_blobs::Hash = [0; 32].into();
117//! let request = GetRequest::blob(hash);
118//! ```
119//!
120//! ### Ranges of blobs
121//!
122//! In this case, we have a (possibly large) blob and we want to retrieve only
123//! some ranges of chunks. This is useful in similar cases as HTTP range requests.
124//!
125//! We still need just a single element in the [`ChunkRangesSeq`], since we are
126//! still only interested in a single blob. However, this element contains all
127//! the chunk ranges we want to retrieve.
128//!
129//! For example, if we want to retrieve chunks 0-10 of a blob, we would
130//! create a [`ChunkRangesSeq`] like this:
131//!
132//! ```rust
133//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt};
134//! # let hash: iroh_blobs::Hash = [0; 32].into();
135//! let request = GetRequest::builder()
136//!     .root(ChunkRanges::chunks(..10))
137//!     .build(hash);
138//! ```
139//!
140//! While not that common, it is also possible to request multiple ranges of a
141//! single blob. For example, if we want to retrieve chunks `0-10` and `100-110`
142//! of a large file, we would create a [`GetRequest`] like this:
143//!
144//! ```rust
145//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
146//! # let hash: iroh_blobs::Hash = [0; 32].into();
147//! let request = GetRequest::builder()
148//!     .root(ChunkRanges::chunks(..10) | ChunkRanges::chunks(100..110))
149//!     .build(hash);
150//! ```
151//!
152//! This is all great, but in most cases we are not interested in chunks but
153//! in bytes. The [`ChunkRanges`] type has a constructor that allows providing
154//! byte ranges instead of chunk ranges. These will be rounded up to the
155//! nearest chunk.
156//!
157//! ```rust
158//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
159//! # let hash: iroh_blobs::Hash = [0; 32].into();
160//! let request = GetRequest::builder()
161//!     .root(ChunkRanges::bytes(..1000) | ChunkRanges::bytes(10000..11000))
162//!     .build(hash);
163//! ```
164//!
165//! There are also methods to request a single chunk or a single byte offset,
166//! as well as a special constructor for the last chunk of a blob.
167//!
168//! ```rust
169//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
170//! # let hash: iroh_blobs::Hash = [0; 32].into();
171//! let request = GetRequest::builder()
172//!     .root(ChunkRanges::offset(1) | ChunkRanges::last_chunk())
173//!     .build(hash);
174//! ```
175//!
176//! To specify chunk ranges, we use the [`ChunkRanges`] type alias.
177//! This is actually the [`RangeSet`] type from the
178//! [range_collections](https://crates.io/crates/range_collections) crate. This
179//! type supports efficient boolean operations on sets of non-overlapping ranges.
180//!
181//! The [`RangeSet2`] type is a type alias for [`RangeSet`] that can store up to
182//! 2 boundaries without allocating. This is sufficient for most use cases.
183//!
184//! [`RangeSet`]: range_collections::range_set::RangeSet
185//! [`RangeSet2`]: range_collections::range_set::RangeSet2
186//!
187//! ### Hash sequences
188//!
189//! In this case the provider has a hash sequence that refers multiple blobs.
190//! We want to retrieve all blobs in the hash sequence.
191//!
192//! When used for hash sequences, the first element of a [`ChunkRangesSeq`] refers
193//! to the hash seq itself, and all subsequent elements refer to the blobs
194//! in the hash seq. When a [`ChunkRangesSeq`] specifies ranges for more than
195//! one blob, the provider will interpret this as a request for a hash seq.
196//!
197//! One thing to note is that we might not yet know how many blobs are in the
198//! hash sequence. Therefore, it is not possible to download an entire hash seq
199//! by just specifying [`ChunkRanges::all()`] for all children.
200//!
201//! Instead, [`ChunkRangesSeq`] allows defining infinite sequences of range sets.
202//! The [`ChunkRangesSeq::all()`] method returns a [`ChunkRangesSeq`] that, when iterated
203//! over, will yield [`ChunkRanges::all()`] forever.
204//!
205//! So a get request to download a hash sequence blob and all its children
206//! would look like this:
207//!
208//! ```rust
209//! # use iroh_blobs::protocol::{ChunkRanges, ChunkRangesExt, GetRequest};
210//! # let hash: iroh_blobs::Hash = [0; 32].into();
211//! let request = GetRequest::builder()
212//!     .root(ChunkRanges::all())
213//!     .build_open(hash); // repeats the last range forever
214//! ```
215//!
216//! Downloading an entire hash seq is also a very common case, so there is a
217//! convenience method [`GetRequest::all`] that only requires the hash of the
218//! hash sequence blob.
219//!
220//! ```rust
221//! # use iroh_blobs::protocol::{ChunkRanges, ChunkRangesExt, GetRequest};
222//! # let hash: iroh_blobs::Hash = [0; 32].into();
223//! let request = GetRequest::all(hash);
224//! ```
225//!
226//! ### Parts of hash sequences
227//!
228//! The most complex common case is when we have retrieved a hash seq and
229//! it's children, but were interrupted before we could retrieve all children.
230//!
231//! In this case we need to specify the hash seq we want to retrieve, but
232//! exclude the children and parts of children that we already have.
233//!
234//! For example, if we have a hash with 3 children, and we already have
235//! the first child and the first 1000000 chunks of the second child.
236//!
237//! We would create a [`GetRequest`] like this:
238//!
239//! ```rust
240//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt};
241//! # let hash: iroh_blobs::Hash = [0; 32].into();
242//! let request = GetRequest::builder()
243//!     .child(1, ChunkRanges::chunks(1000000..)) // we don't need the first child;
244//!     .next(ChunkRanges::all()) // we need the second child and all subsequent children completely
245//!     .build_open(hash);
246//! ```
247//!
248//! ### Requesting chunks for each child
249//!
250//! The ChunkRangesSeq allows some scenarios that are not covered above. E.g. you
251//! might want to request a hash seq and the first chunk of each child blob to
252//! do something like mime type detection.
253//!
254//! You do not know how many children the collection has, so you need to use
255//! an infinite sequence.
256//!
257//! ```rust
258//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
259//! # let hash: iroh_blobs::Hash = [0; 32].into();
260//! let request = GetRequest::builder()
261//!     .root(ChunkRanges::all())
262//!     .next(ChunkRanges::chunk(1)) // the first chunk of each child)
263//!     .build_open(hash);
264//! ```
265//!
266//! ### Requesting a single child
267//!
268//! It is of course possible to request a single child of a collection. E.g.
269//! the following would download the second child of a collection:
270//!
271//! ```rust
272//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt};
273//! # let hash: iroh_blobs::Hash = [0; 32].into();
274//! let request = GetRequest::builder()
275//!     .child(1, ChunkRanges::all()) // we need the second child completely
276//!     .build(hash);
277//! ```
278//!
279//! However, if you already have the collection, you might as well locally
280//! look up the hash of the child and request it directly.
281//!
282//! ```rust
283//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesSeq};
284//! # let child_hash: iroh_blobs::Hash = [0; 32].into();
285//! let request = GetRequest::blob(child_hash);
286//! ```
287//!
288//! ### Why ChunkRanges and ChunkRangesSeq?
289//!
290//! You might wonder why we have [`ChunkRangesSeq`], when a simple
291//! sequence of [`ChunkRanges`] might also do.
292//!
293//! The [`ChunkRangesSeq`] type exist to provide an efficient
294//! representation of the request on the wire. In the wire encoding of [`ChunkRangesSeq`],
295//! [`ChunkRanges`] are encoded alternating intervals of selected and non-selected chunks.
296//! This results in smaller numbers that will result in fewer bytes on the wire when using
297//! the [postcard](https://crates.io/crates/postcard) encoding format that uses variable
298//! length integers.
299//!
300//! Likewise, the [`ChunkRangesSeq`] type
301//! does run length encoding to remove repeating elements. It also allows infinite
302//! sequences of [`ChunkRanges`] to be encoded, unlike a simple sequence of
303//! [`ChunkRanges`]s.
304//!
305//! [`ChunkRangesSeq`] should be efficient even in case of very fragmented availability
306//! of chunks, like a download from multiple providers that was frequently interrupted.
307//!
308//! # Responses
309//!
310//! The response stream contains the bao encoded bytes for the requested data.
311//! The data will be sent in the order in which it was requested, so ascending
312//! chunks for each blob, and blobs in the order in which they appear in the
313//! hash seq.
314//!
315//! For details on the bao encoding, see the [bao specification](https://github.com/oconnor663/bao/blob/master/docs/spec.md)
316//! and the [bao-tree](https://crates.io/crates/bao-tree) crate. The bao-tree crate
317//! is identical to the bao crate, except that it allows combining multiple BLAKE3
318//! chunks to chunk groups for efficiency.
319//!
320//! For a complete response, the chunks are guaranteed to completely cover the
321//! requested ranges.
322//!
323//! Reasons for not retrieving a complete response are two-fold:
324//!
325//! - the connection to the provider was interrupted, or the provider encountered
326//!   an internal error. In this case the provider will close the entire quinn connection.
327//!
328//! - the provider does not have the requested data, or discovered on send that the
329//!   requested data is not valid.
330//!
331//! In this case the provider will close just the stream used to send the response.
332//! The exact location of the missing data can be retrieved from the error.
333//!
334//! # Requesting multiple unrelated blobs
335//!
336//! Let's say you don't have a hash sequence on the provider side, but you
337//! nevertheless want to request multiple unrelated blobs in a single request.
338//!
339//! For this, there is the [`GetManyRequest`] type, which also comes with a
340//! builder API.
341//!
342//! ```rust
343//! # use iroh_blobs::protocol::{GetManyRequest, ChunkRanges, ChunkRangesExt};
344//! # let hash1: iroh_blobs::Hash = [0; 32].into();
345//! # let hash2: iroh_blobs::Hash = [1; 32].into();
346//! GetManyRequest::builder()
347//!     .hash(hash1, ChunkRanges::all())
348//!     .hash(hash2, ChunkRanges::all())
349//!     .build();
350//! ```
351//! If you accidentally or intentionally request ranges for the same hash
352//! multiple times, they will be merged into a single [`ChunkRanges`].
353//!
354//! ```rust
355//! # use iroh_blobs::protocol::{GetManyRequest, ChunkRanges, ChunkRangesExt};
356//! # let hash1: iroh_blobs::Hash = [0; 32].into();
357//! # let hash2: iroh_blobs::Hash = [1; 32].into();
358//! GetManyRequest::builder()
359//!     .hash(hash1, ChunkRanges::chunk(1))
360//!     .hash(hash2, ChunkRanges::all())
361//!     .hash(hash1, ChunkRanges::last_chunk())
362//!     .build();
363//! ```
364//!
365//! This is mostly useful for requesting multiple tiny blobs in a single request.
366//! For large or even medium sized blobs, multiple requests are not expensive.
367//! Multiple requests just create multiple streams on the same connection,
368//! which is *very* cheap in QUIC.
369//!
370//! In case nodes are permanently exchanging data, it is somewhat valuable to
371//! keep a connection open and reuse it for multiple requests. However, creating
372//! a new connection is also very cheap, so you would only do this to optimize
373//! a large existing system that has demonstrated performance issues.
374//!
375//! If in doubt, just use multiple requests and multiple connections.
376use std::{
377    io,
378    ops::{Bound, RangeBounds},
379};
380
381use bao_tree::{io::round_up_to_chunks, ChunkNum};
382use builder::GetRequestBuilder;
383use derive_more::From;
384use iroh::endpoint::VarInt;
385use irpc::util::AsyncReadVarintExt;
386use postcard::experimental::max_size::MaxSize;
387use range_collections::{range_set::RangeSetEntry, RangeSet2};
388use serde::{Deserialize, Serialize};
389mod range_spec;
390pub use bao_tree::ChunkRanges;
391pub use range_spec::{ChunkRangesSeq, NonEmptyRequestRangeSpecIter, RangeSpec};
392use snafu::{GenerateImplicitData, Snafu};
393use tokio::io::AsyncReadExt;
394
395use crate::{api::blobs::Bitfield, provider::RecvStreamExt, BlobFormat, Hash, HashAndFormat};
396
397/// Maximum message size is limited to 100MiB for now.
398pub const MAX_MESSAGE_SIZE: usize = 1024 * 1024;
399
400/// Error code for a permission error
401pub const ERR_PERMISSION: VarInt = VarInt::from_u32(1u32);
402/// Error code for when a request is aborted due to a rate limit
403pub const ERR_LIMIT: VarInt = VarInt::from_u32(2u32);
404/// Error code for when a request is aborted due to internal error
405pub const ERR_INTERNAL: VarInt = VarInt::from_u32(3u32);
406
407/// The ALPN used with quic for the iroh blobs protocol.
408pub const ALPN: &[u8] = b"/iroh-bytes/4";
409
410#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, From)]
411/// A request to the provider
412pub enum Request {
413    /// A get request for a blob or collection
414    Get(GetRequest),
415    Observe(ObserveRequest),
416    Slot2,
417    Slot3,
418    Slot4,
419    Slot5,
420    Slot6,
421    Slot7,
422    /// The inverse of a get request - push data to the provider
423    ///
424    /// Note that providers will in many cases reject this request, e.g. if
425    /// they don't have write access to the store or don't want to ingest
426    /// unknown data.
427    Push(PushRequest),
428    /// Get multiple blobs in a single request, from a single provider
429    ///
430    /// This is identical to a [`GetRequest`] for a [`crate::hashseq::HashSeq`], but the provider
431    /// does not need to have the hash seq.
432    GetMany(GetManyRequest),
433}
434
435/// This must contain the request types in the same order as the full requests
436#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, Copy, MaxSize)]
437pub enum RequestType {
438    Get,
439    Observe,
440    Slot2,
441    Slot3,
442    Slot4,
443    Slot5,
444    Slot6,
445    Slot7,
446    Push,
447    GetMany,
448}
449
450impl Request {
451    pub async fn read_async(reader: &mut iroh::endpoint::RecvStream) -> io::Result<(Self, usize)> {
452        let request_type = reader.read_u8().await?;
453        let request_type: RequestType = postcard::from_bytes(std::slice::from_ref(&request_type))
454            .map_err(|_| {
455            io::Error::new(
456                io::ErrorKind::InvalidData,
457                "failed to deserialize request type",
458            )
459        })?;
460        Ok(match request_type {
461            RequestType::Get => {
462                let (r, size) = reader
463                    .read_to_end_as::<GetRequest>(MAX_MESSAGE_SIZE)
464                    .await?;
465                (r.into(), size)
466            }
467            RequestType::GetMany => {
468                let (r, size) = reader
469                    .read_to_end_as::<GetManyRequest>(MAX_MESSAGE_SIZE)
470                    .await?;
471                (r.into(), size)
472            }
473            RequestType::Observe => {
474                let (r, size) = reader
475                    .read_to_end_as::<ObserveRequest>(MAX_MESSAGE_SIZE)
476                    .await?;
477                (r.into(), size)
478            }
479            RequestType::Push => {
480                let r = reader
481                    .read_length_prefixed::<PushRequest>(MAX_MESSAGE_SIZE)
482                    .await?;
483                let size = postcard::experimental::serialized_size(&r).unwrap();
484                (r.into(), size)
485            }
486            _ => {
487                return Err(io::Error::new(
488                    io::ErrorKind::InvalidData,
489                    "failed to deserialize request type",
490                ));
491            }
492        })
493    }
494}
495
496/// A get request
497#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, Hash)]
498pub struct GetRequest {
499    /// blake3 hash
500    pub hash: Hash,
501    /// The range of data to request
502    ///
503    /// The first element is the parent, all subsequent elements are children.
504    pub ranges: ChunkRangesSeq,
505}
506
507impl From<HashAndFormat> for GetRequest {
508    fn from(value: HashAndFormat) -> Self {
509        match value.format {
510            BlobFormat::Raw => Self::blob(value.hash),
511            BlobFormat::HashSeq => Self::all(value.hash),
512        }
513    }
514}
515
516impl GetRequest {
517    pub fn builder() -> GetRequestBuilder {
518        GetRequestBuilder::default()
519    }
520
521    pub fn content(&self) -> HashAndFormat {
522        HashAndFormat {
523            hash: self.hash,
524            format: if self.ranges.is_blob() {
525                BlobFormat::Raw
526            } else {
527                BlobFormat::HashSeq
528            },
529        }
530    }
531
532    /// Request a blob or collection with specified ranges
533    pub fn new(hash: Hash, ranges: ChunkRangesSeq) -> Self {
534        Self { hash, ranges }
535    }
536
537    /// Request a collection and all its children
538    pub fn all(hash: impl Into<Hash>) -> Self {
539        Self {
540            hash: hash.into(),
541            ranges: ChunkRangesSeq::all(),
542        }
543    }
544
545    /// Request just a single blob
546    pub fn blob(hash: impl Into<Hash>) -> Self {
547        Self {
548            hash: hash.into(),
549            ranges: ChunkRangesSeq::from_ranges([ChunkRanges::all()]),
550        }
551    }
552
553    /// Request ranges from a single blob
554    pub fn blob_ranges(hash: Hash, ranges: ChunkRanges) -> Self {
555        Self {
556            hash,
557            ranges: ChunkRangesSeq::from_ranges([ranges]),
558        }
559    }
560}
561
562/// A push request contains a description of what to push, but will be followed
563/// by the data to push.
564#[derive(
565    Deserialize, Serialize, Debug, PartialEq, Eq, Clone, derive_more::From, derive_more::Deref,
566)]
567pub struct PushRequest(GetRequest);
568
569impl PushRequest {
570    pub fn new(hash: Hash, ranges: ChunkRangesSeq) -> Self {
571        Self(GetRequest::new(hash, ranges))
572    }
573}
574
575/// A GetMany request is a request to get multiple blobs via a single request.
576///
577/// It is identical to a [`GetRequest`] for a HashSeq, but the HashSeq is provided
578/// by the requester.
579#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone)]
580pub struct GetManyRequest {
581    /// The hashes of the blobs to get
582    pub hashes: Vec<Hash>,
583    /// The ranges of data to request
584    ///
585    /// There is no range request for the parent, since we just sent the hashes
586    /// and therefore have the parent already.
587    pub ranges: ChunkRangesSeq,
588}
589
590impl<I: Into<Hash>> FromIterator<I> for GetManyRequest {
591    fn from_iter<T: IntoIterator<Item = I>>(iter: T) -> Self {
592        let mut res = iter.into_iter().map(Into::into).collect::<Vec<Hash>>();
593        res.sort();
594        res.dedup();
595        let n = res.len() as u64;
596        Self {
597            hashes: res,
598            ranges: ChunkRangesSeq(smallvec::smallvec![
599                (0, ChunkRanges::all()),
600                (n, ChunkRanges::empty())
601            ]),
602        }
603    }
604}
605
606impl GetManyRequest {
607    pub fn new(hashes: Vec<Hash>, ranges: ChunkRangesSeq) -> Self {
608        Self { hashes, ranges }
609    }
610
611    pub fn builder() -> builder::GetManyRequestBuilder {
612        builder::GetManyRequestBuilder::default()
613    }
614}
615
616/// A request to observe a raw blob bitfield.
617#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, Hash)]
618pub struct ObserveRequest {
619    /// blake3 hash
620    pub hash: Hash,
621    /// ranges to observe.
622    pub ranges: RangeSpec,
623}
624
625impl ObserveRequest {
626    pub fn new(hash: Hash) -> Self {
627        Self {
628            hash,
629            ranges: RangeSpec::all(),
630        }
631    }
632}
633
634#[derive(Deserialize, Serialize, Debug, PartialEq, Eq)]
635pub struct ObserveItem {
636    pub size: u64,
637    pub ranges: ChunkRanges,
638}
639
640impl From<&Bitfield> for ObserveItem {
641    fn from(value: &Bitfield) -> Self {
642        Self {
643            size: value.size,
644            ranges: value.ranges.clone(),
645        }
646    }
647}
648
649impl From<&ObserveItem> for Bitfield {
650    fn from(value: &ObserveItem) -> Self {
651        Self {
652            size: value.size,
653            ranges: value.ranges.clone(),
654        }
655    }
656}
657
658/// Reasons to close connections or stop streams.
659///
660/// A QUIC **connection** can be *closed* and a **stream** can request the other side to
661/// *stop* sending data.  Both closing and stopping have an associated `error_code`, closing
662/// also adds a `reason` as some arbitrary bytes.
663///
664/// This enum exists so we have a single namespace for `error_code`s used.
665#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
666#[repr(u16)]
667pub enum Closed {
668    /// The [`RecvStream`] was dropped.
669    ///
670    /// Used implicitly when a [`RecvStream`] is dropped without explicit call to
671    /// [`RecvStream::stop`].  We don't use this explicitly but this is here as
672    /// documentation as to what happened to `0`.
673    ///
674    /// [`RecvStream`]: iroh::endpoint::RecvStream
675    /// [`RecvStream::stop`]: iroh::endpoint::RecvStream::stop
676    StreamDropped = 0,
677    /// The provider is terminating.
678    ///
679    /// When a provider terminates all connections and associated streams are closed.
680    ProviderTerminating = 1,
681    /// The provider has received the request.
682    ///
683    /// Only a single request is allowed on a stream, if more data is received after this a
684    /// provider may send this error code in a STOP_STREAM frame.
685    RequestReceived = 2,
686}
687
688impl Closed {
689    /// The close reason as bytes. This is a valid utf8 string describing the reason.
690    pub fn reason(&self) -> &'static [u8] {
691        match self {
692            Closed::StreamDropped => b"stream dropped",
693            Closed::ProviderTerminating => b"provider terminating",
694            Closed::RequestReceived => b"request received",
695        }
696    }
697}
698
699impl From<Closed> for VarInt {
700    fn from(source: Closed) -> Self {
701        VarInt::from(source as u16)
702    }
703}
704
705/// Unknown error_code, can not be converted into [`Closed`].
706#[derive(Debug, Snafu)]
707#[snafu(display("Unknown error_code: {code}"))]
708pub struct UnknownErrorCode {
709    code: u64,
710    backtrace: Option<snafu::Backtrace>,
711}
712
713impl UnknownErrorCode {
714    pub(crate) fn new(code: u64) -> Self {
715        Self {
716            code,
717            backtrace: GenerateImplicitData::generate(),
718        }
719    }
720}
721
722impl TryFrom<VarInt> for Closed {
723    type Error = UnknownErrorCode;
724
725    fn try_from(value: VarInt) -> std::result::Result<Self, Self::Error> {
726        match value.into_inner() {
727            0 => Ok(Self::StreamDropped),
728            1 => Ok(Self::ProviderTerminating),
729            2 => Ok(Self::RequestReceived),
730            val => Err(UnknownErrorCode::new(val)),
731        }
732    }
733}
734
735pub trait ChunkRangesExt {
736    fn last_chunk() -> Self;
737    fn chunk(offset: u64) -> Self;
738    fn bytes(ranges: impl RangeBounds<u64>) -> Self;
739    fn chunks(ranges: impl RangeBounds<u64>) -> Self;
740    fn offset(offset: u64) -> Self;
741}
742
743impl ChunkRangesExt for ChunkRanges {
744    fn last_chunk() -> Self {
745        ChunkRanges::from(ChunkNum(u64::MAX)..)
746    }
747
748    /// Create a chunk range that contains a single chunk.
749    fn chunk(offset: u64) -> Self {
750        ChunkRanges::from(ChunkNum(offset)..ChunkNum(offset + 1))
751    }
752
753    /// Create a range of chunks that contains the given byte ranges.
754    /// The byte ranges are rounded up to the nearest chunk size.
755    fn bytes(ranges: impl RangeBounds<u64>) -> Self {
756        round_up_to_chunks(&bounds_from_range(ranges, |v| v))
757    }
758
759    /// Create a range of chunks from u64 chunk bounds.
760    ///
761    /// This is equivalent but more convenient than using the ChunkNum newtype.
762    fn chunks(ranges: impl RangeBounds<u64>) -> Self {
763        bounds_from_range(ranges, ChunkNum)
764    }
765
766    /// Create a chunk range that contains a single byte offset.
767    fn offset(offset: u64) -> Self {
768        Self::bytes(offset..offset + 1)
769    }
770}
771
772// todo: move to range_collections
773pub(crate) fn bounds_from_range<R, T, F>(range: R, f: F) -> RangeSet2<T>
774where
775    R: RangeBounds<u64>,
776    T: RangeSetEntry,
777    F: Fn(u64) -> T,
778{
779    let from = match range.start_bound() {
780        Bound::Included(start) => Some(*start),
781        Bound::Excluded(start) => {
782            let Some(start) = start.checked_add(1) else {
783                return RangeSet2::empty();
784            };
785            Some(start)
786        }
787        Bound::Unbounded => None,
788    };
789    let to = match range.end_bound() {
790        Bound::Included(end) => end.checked_add(1),
791        Bound::Excluded(end) => Some(*end),
792        Bound::Unbounded => None,
793    };
794    match (from, to) {
795        (Some(from), Some(to)) => RangeSet2::from(f(from)..f(to)),
796        (Some(from), None) => RangeSet2::from(f(from)..),
797        (None, Some(to)) => RangeSet2::from(..f(to)),
798        (None, None) => RangeSet2::all(),
799    }
800}
801
802pub mod builder {
803    use std::collections::BTreeMap;
804
805    use bao_tree::ChunkRanges;
806
807    use super::ChunkRangesSeq;
808    use crate::{
809        protocol::{GetManyRequest, GetRequest},
810        Hash,
811    };
812
813    #[derive(Debug, Clone, Default)]
814    pub struct ChunkRangesSeqBuilder {
815        ranges: BTreeMap<u64, ChunkRanges>,
816    }
817
818    #[derive(Debug, Clone, Default)]
819    pub struct GetRequestBuilder {
820        builder: ChunkRangesSeqBuilder,
821    }
822
823    impl GetRequestBuilder {
824        /// Add a range to the request.
825        pub fn offset(mut self, offset: u64, ranges: impl Into<ChunkRanges>) -> Self {
826            self.builder = self.builder.offset(offset, ranges);
827            self
828        }
829
830        /// Add a range to the request.
831        pub fn child(mut self, child: u64, ranges: impl Into<ChunkRanges>) -> Self {
832            self.builder = self.builder.offset(child + 1, ranges);
833            self
834        }
835
836        /// Specify ranges for the root blob (the HashSeq)
837        pub fn root(mut self, ranges: impl Into<ChunkRanges>) -> Self {
838            self.builder = self.builder.offset(0, ranges);
839            self
840        }
841
842        /// Specify ranges for the next offset.
843        pub fn next(mut self, ranges: impl Into<ChunkRanges>) -> Self {
844            self.builder = self.builder.next(ranges);
845            self
846        }
847
848        /// Build a get request for the given hash, with the ranges specified in the builder.
849        pub fn build(self, hash: impl Into<Hash>) -> GetRequest {
850            let ranges = self.builder.build();
851            GetRequest::new(hash.into(), ranges)
852        }
853
854        /// Build a get request for the given hash, with the ranges specified in the builder
855        /// and the last non-empty range repeating indefinitely.
856        pub fn build_open(self, hash: impl Into<Hash>) -> GetRequest {
857            let ranges = self.builder.build_open();
858            GetRequest::new(hash.into(), ranges)
859        }
860    }
861
862    impl ChunkRangesSeqBuilder {
863        /// Add a range to the request.
864        pub fn offset(self, offset: u64, ranges: impl Into<ChunkRanges>) -> Self {
865            self.at_offset(offset, ranges.into())
866        }
867
868        /// Specify ranges for the next offset.
869        pub fn next(self, ranges: impl Into<ChunkRanges>) -> Self {
870            let offset = self.next_offset_value();
871            self.at_offset(offset, ranges.into())
872        }
873
874        /// Build a get request for the given hash, with the ranges specified in the builder.
875        pub fn build(self) -> ChunkRangesSeq {
876            ChunkRangesSeq::from_ranges(self.build0())
877        }
878
879        /// Build a get request for the given hash, with the ranges specified in the builder
880        /// and the last non-empty range repeating indefinitely.
881        pub fn build_open(self) -> ChunkRangesSeq {
882            ChunkRangesSeq::from_ranges_infinite(self.build0())
883        }
884
885        /// Add ranges at the given offset.
886        fn at_offset(mut self, offset: u64, ranges: ChunkRanges) -> Self {
887            self.ranges
888                .entry(offset)
889                .and_modify(|v| *v |= ranges.clone())
890                .or_insert(ranges);
891            self
892        }
893
894        /// Build the request.
895        fn build0(mut self) -> impl Iterator<Item = ChunkRanges> {
896            let mut ranges = Vec::new();
897            self.ranges.retain(|_, v| !v.is_empty());
898            let until_key = self.next_offset_value();
899            for offset in 0..until_key {
900                ranges.push(self.ranges.remove(&offset).unwrap_or_default());
901            }
902            ranges.into_iter()
903        }
904
905        /// Get the next offset value.
906        fn next_offset_value(&self) -> u64 {
907            self.ranges
908                .last_key_value()
909                .map(|(k, _)| *k + 1)
910                .unwrap_or_default()
911        }
912    }
913
914    #[derive(Debug, Clone, Default)]
915    pub struct GetManyRequestBuilder {
916        ranges: BTreeMap<Hash, ChunkRanges>,
917    }
918
919    impl GetManyRequestBuilder {
920        /// Specify ranges for the given hash.
921        ///
922        /// Note that if you specify a hash that is already in the request, the ranges will be
923        /// merged with the existing ranges.
924        pub fn hash(mut self, hash: impl Into<Hash>, ranges: impl Into<ChunkRanges>) -> Self {
925            let ranges = ranges.into();
926            let hash = hash.into();
927            self.ranges
928                .entry(hash)
929                .and_modify(|v| *v |= ranges.clone())
930                .or_insert(ranges);
931            self
932        }
933
934        /// Build a `GetManyRequest`.
935        pub fn build(self) -> GetManyRequest {
936            let (hashes, ranges): (Vec<Hash>, Vec<ChunkRanges>) = self
937                .ranges
938                .into_iter()
939                .filter(|(_, v)| !v.is_empty())
940                .unzip();
941            let ranges = ChunkRangesSeq::from_ranges(ranges);
942            GetManyRequest { hashes, ranges }
943        }
944    }
945
946    #[cfg(test)]
947    mod tests {
948        use bao_tree::ChunkNum;
949
950        use super::*;
951        use crate::protocol::{ChunkRangesExt, GetManyRequest};
952
953        #[test]
954        fn chunk_ranges_ext() {
955            let ranges = ChunkRanges::bytes(1..2)
956                | ChunkRanges::chunks(100..=200)
957                | ChunkRanges::offset(1024 * 10)
958                | ChunkRanges::chunk(1024)
959                | ChunkRanges::last_chunk();
960            assert_eq!(
961                ranges,
962                ChunkRanges::from(ChunkNum(0)..ChunkNum(1)) // byte range 1..2
963                    | ChunkRanges::from(ChunkNum(10)..ChunkNum(11)) // chunk at byte offset 1024*10
964                    | ChunkRanges::from(ChunkNum(100)..ChunkNum(201)) // chunk range 100..=200
965                    | ChunkRanges::from(ChunkNum(1024)..ChunkNum(1025)) // chunk 1024
966                    | ChunkRanges::last_chunk() // last chunk
967            );
968        }
969
970        #[test]
971        fn get_request_builder() {
972            let hash = [0; 32];
973            let request = GetRequest::builder()
974                .root(ChunkRanges::all())
975                .next(ChunkRanges::all())
976                .next(ChunkRanges::bytes(0..100))
977                .build(hash);
978            assert_eq!(request.hash.as_bytes(), &hash);
979            assert_eq!(
980                request.ranges,
981                ChunkRangesSeq::from_ranges([
982                    ChunkRanges::all(),
983                    ChunkRanges::all(),
984                    ChunkRanges::from(..ChunkNum(1)),
985                ])
986            );
987
988            let request = GetRequest::builder()
989                .root(ChunkRanges::all())
990                .child(2, ChunkRanges::bytes(0..100))
991                .build(hash);
992            assert_eq!(request.hash.as_bytes(), &hash);
993            assert_eq!(
994                request.ranges,
995                ChunkRangesSeq::from_ranges([
996                    ChunkRanges::all(),               // root
997                    ChunkRanges::empty(),             // child 0
998                    ChunkRanges::empty(),             // child 1
999                    ChunkRanges::from(..ChunkNum(1))  // child 2,
1000                ])
1001            );
1002
1003            let request = GetRequest::builder()
1004                .root(ChunkRanges::all())
1005                .next(ChunkRanges::bytes(0..1024) | ChunkRanges::last_chunk())
1006                .build_open(hash);
1007            assert_eq!(request.hash.as_bytes(), &[0; 32]);
1008            assert_eq!(
1009                request.ranges,
1010                ChunkRangesSeq::from_ranges_infinite([
1011                    ChunkRanges::all(),
1012                    ChunkRanges::from(..ChunkNum(1)) | ChunkRanges::last_chunk(),
1013                ])
1014            );
1015        }
1016
1017        #[test]
1018        fn get_many_request_builder() {
1019            let hash1 = [0; 32];
1020            let hash2 = [1; 32];
1021            let hash3 = [2; 32];
1022            let request = GetManyRequest::builder()
1023                .hash(hash1, ChunkRanges::all())
1024                .hash(hash2, ChunkRanges::empty()) // will be ignored!
1025                .hash(hash3, ChunkRanges::bytes(0..100))
1026                .build();
1027            assert_eq!(
1028                request.hashes,
1029                vec![Hash::from([0; 32]), Hash::from([2; 32])]
1030            );
1031            assert_eq!(
1032                request.ranges,
1033                ChunkRangesSeq::from_ranges([
1034                    ChunkRanges::all(),               // hash 0
1035                    ChunkRanges::from(..ChunkNum(1)), // hash 2
1036                ])
1037            );
1038        }
1039    }
1040}
1041
1042#[cfg(test)]
1043mod tests {
1044    use iroh_test::{assert_eq_hex, hexdump::parse_hexdump};
1045    use postcard::experimental::max_size::MaxSize;
1046
1047    use super::{GetRequest, Request, RequestType};
1048    use crate::Hash;
1049
1050    #[test]
1051    fn request_wire_format() {
1052        let hash: Hash = [0xda; 32].into();
1053        let cases = [
1054            (
1055                Request::from(GetRequest::blob(hash)),
1056                r"
1057                    00 # enum variant for GetRequest
1058                    dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
1059                    020001000100 # the ChunkRangesSeq
1060            ",
1061            ),
1062            (
1063                Request::from(GetRequest::all(hash)),
1064                r"
1065                    00 # enum variant for GetRequest
1066                    dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
1067                    01000100 # the ChunkRangesSeq
1068            ",
1069            ),
1070        ];
1071        for (case, expected_hex) in cases {
1072            let expected = parse_hexdump(expected_hex).unwrap();
1073            let bytes = postcard::to_stdvec(&case).unwrap();
1074            assert_eq_hex!(bytes, expected);
1075        }
1076    }
1077
1078    #[test]
1079    fn request_type_size() {
1080        assert_eq!(RequestType::POSTCARD_MAX_SIZE, 1);
1081    }
1082}