iroh_blobs/protocol.rs
1//! Protocol for transferring content-addressed blobs over [`iroh`] p2p QUIC connections.
2//!
3//! # Participants
4//!
5//! The protocol is a request/response protocol with two parties, a *provider* that
6//! serves blobs and a *getter* that requests blobs.
7//!
8//! # Goals
9//!
10//! - Be paranoid about data integrity.
11//!
12//! Data integrity is considered more important than performance. Data will be validated both on
13//! the provider and getter side. A well behaved provider will never send invalid data. Responses
14//! to range requests contain sufficient information to validate the data.
15//!
16//! Note: Validation using blake3 is extremely fast, so in almost all scenarios the validation
17//! will not be the bottleneck even if we validate both on the provider and getter side.
18//!
19//! - Do not limit the size of blobs or collections.
20//!
21//! Blobs can be of arbitrary size, up to terabytes. Likewise, collections can contain an
22//! arbitrary number of links. A well behaved implementation will not require the entire blob or
23//! collection to be in memory at once.
24//!
25//! - Be efficient when transferring large blobs, including range requests.
26//!
27//! It is possible to request entire blobs or ranges of blobs, where the minimum granularity is a
28//! chunk group of 16KiB or 16 blake3 chunks. The worst case overhead when doing range requests
29//! is about two chunk groups per range.
30//!
31//! - Be efficient when transferring multiple tiny blobs.
32//!
33//! For tiny blobs the overhead of sending the blob hashes and the round-trip time for each blob
34//! would be prohibitive.
35//!
36//! To avoid roundtrips, the protocol allows grouping multiple blobs into *collections*.
37//! The semantic meaning of a collection is up to the application. For the purpose
38//! of this protocol, a collection is just a grouping of related blobs.
39//!
40//! # Non-goals
41//!
42//! - Do not attempt to be generic in terms of the used hash function.
43//!
44//! The protocol makes extensive use of the [blake3](https://crates.io/crates/blake3) hash
45//! function and it's special properties such as blake3 verified streaming.
46//!
47//! - Do not support graph traversal.
48//!
49//! The protocol only supports collections that directly contain blobs. If you have deeply nested
50//! graph data, you will need to either do multiple requests or flatten the graph into a single
51//! temporary collection.
52//!
53//! - Do not support discovery.
54//!
55//! The protocol does not yet have a discovery mechanism for asking the provider what ranges are
56//! available for a given blob. Currently you have to have some out-of-band knowledge about what
57//! node has data for a given hash, or you can just try to retrieve the data and see if it is
58//! available.
59//!
60//! A discovery protocol is planned in the future though.
61//!
62//! # Requests
63//!
64//! ## Getter defined requests
65//!
66//! In this case the getter knows the hash of the blob it wants to retrieve and
67//! whether it wants to retrieve a single blob or a collection.
68//!
69//! The getter needs to define exactly what it wants to retrieve and send the
70//! request to the provider.
71//!
72//! The provider will then respond with the bao encoded bytes for the requested
73//! data and then close the connection. It will immediately close the connection
74//! in case some data is not available or invalid.
75//!
76//! ## Provider defined requests
77//!
78//! In this case the getter sends a blob to the provider. This blob can contain
79//! some kind of query. The exact details of the query are up to the application.
80//!
81//! The provider evaluates the query and responds with a serialized request in
82//! the same format as the getter defined requests, followed by the bao encoded
83//! data. From then on the protocol is the same as for getter defined requests.
84//!
85//! ## Specifying the required data
86//!
87//! A [`GetRequest`] contains a hash and a specification of what data related to
88//! that hash is required. The specification is using a [`ChunkRangesSeq`] which
89//! has a compact representation on the wire but is otherwise identical to a
90//! sequence of sets of ranges.
91//!
92//! In the following, we describe how the [`GetRequest`] is to be created for
93//! different common scenarios.
94//!
95//! Under the hood, this is using the [`ChunkRangesSeq`] type, but the most
96//! convenient way to create a [`GetRequest`] is to use the builder API.
97//!
98//! Ranges are always given in terms of 1024 byte blake3 chunks, *not* in terms
99//! of bytes or chunk groups. The reason for this is that chunks are the fundamental
100//! unit of hashing in BLAKE3. Addressing anything smaller than a chunk is not
101//! possible, and combining multiple chunks is merely an optimization to reduce
102//! metadata overhead.
103//!
104//! ### Individual blobs
105//!
106//! In the easiest case, the getter just wants to retrieve a single blob. In this
107//! case, the getter specifies [`ChunkRangesSeq`] that contains a single element.
108//! This element is the set of all chunks to indicate that we
109//! want the entire blob, no matter how many chunks it has.
110//!
111//! Since this is a very common case, there is a convenience method
112//! [`GetRequest::blob`] that only requires the hash of the blob.
113//!
114//! ```rust
115//! # use iroh_blobs::protocol::GetRequest;
116//! # let hash: iroh_blobs::Hash = [0; 32].into();
117//! let request = GetRequest::blob(hash);
118//! ```
119//!
120//! ### Ranges of blobs
121//!
122//! In this case, we have a (possibly large) blob and we want to retrieve only
123//! some ranges of chunks. This is useful in similar cases as HTTP range requests.
124//!
125//! We still need just a single element in the [`ChunkRangesSeq`], since we are
126//! still only interested in a single blob. However, this element contains all
127//! the chunk ranges we want to retrieve.
128//!
129//! For example, if we want to retrieve chunks 0-10 of a blob, we would
130//! create a [`ChunkRangesSeq`] like this:
131//!
132//! ```rust
133//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt};
134//! # let hash: iroh_blobs::Hash = [0; 32].into();
135//! let request = GetRequest::builder()
136//! .root(ChunkRanges::chunks(..10))
137//! .build(hash);
138//! ```
139//!
140//! While not that common, it is also possible to request multiple ranges of a
141//! single blob. For example, if we want to retrieve chunks `0-10` and `100-110`
142//! of a large file, we would create a [`GetRequest`] like this:
143//!
144//! ```rust
145//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
146//! # let hash: iroh_blobs::Hash = [0; 32].into();
147//! let request = GetRequest::builder()
148//! .root(ChunkRanges::chunks(..10) | ChunkRanges::chunks(100..110))
149//! .build(hash);
150//! ```
151//!
152//! This is all great, but in most cases we are not interested in chunks but
153//! in bytes. The [`ChunkRanges`] type has a constructor that allows providing
154//! byte ranges instead of chunk ranges. These will be rounded up to the
155//! nearest chunk.
156//!
157//! ```rust
158//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
159//! # let hash: iroh_blobs::Hash = [0; 32].into();
160//! let request = GetRequest::builder()
161//! .root(ChunkRanges::bytes(..1000) | ChunkRanges::bytes(10000..11000))
162//! .build(hash);
163//! ```
164//!
165//! There are also methods to request a single chunk or a single byte offset,
166//! as well as a special constructor for the last chunk of a blob.
167//!
168//! ```rust
169//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
170//! # let hash: iroh_blobs::Hash = [0; 32].into();
171//! let request = GetRequest::builder()
172//! .root(ChunkRanges::offset(1) | ChunkRanges::last_chunk())
173//! .build(hash);
174//! ```
175//!
176//! To specify chunk ranges, we use the [`ChunkRanges`] type alias.
177//! This is actually the [`RangeSet`] type from the
178//! [range_collections](https://crates.io/crates/range_collections) crate. This
179//! type supports efficient boolean operations on sets of non-overlapping ranges.
180//!
181//! The [`RangeSet2`] type is a type alias for [`RangeSet`] that can store up to
182//! 2 boundaries without allocating. This is sufficient for most use cases.
183//!
184//! [`RangeSet`]: range_collections::range_set::RangeSet
185//! [`RangeSet2`]: range_collections::range_set::RangeSet2
186//!
187//! ### Hash sequences
188//!
189//! In this case the provider has a hash sequence that refers multiple blobs.
190//! We want to retrieve all blobs in the hash sequence.
191//!
192//! When used for hash sequences, the first element of a [`ChunkRangesSeq`] refers
193//! to the hash seq itself, and all subsequent elements refer to the blobs
194//! in the hash seq. When a [`ChunkRangesSeq`] specifies ranges for more than
195//! one blob, the provider will interpret this as a request for a hash seq.
196//!
197//! One thing to note is that we might not yet know how many blobs are in the
198//! hash sequence. Therefore, it is not possible to download an entire hash seq
199//! by just specifying [`ChunkRanges::all()`] for all children.
200//!
201//! Instead, [`ChunkRangesSeq`] allows defining infinite sequences of range sets.
202//! The [`ChunkRangesSeq::all()`] method returns a [`ChunkRangesSeq`] that, when iterated
203//! over, will yield [`ChunkRanges::all()`] forever.
204//!
205//! So a get request to download a hash sequence blob and all its children
206//! would look like this:
207//!
208//! ```rust
209//! # use iroh_blobs::protocol::{ChunkRanges, ChunkRangesExt, GetRequest};
210//! # let hash: iroh_blobs::Hash = [0; 32].into();
211//! let request = GetRequest::builder()
212//! .root(ChunkRanges::all())
213//! .build_open(hash); // repeats the last range forever
214//! ```
215//!
216//! Downloading an entire hash seq is also a very common case, so there is a
217//! convenience method [`GetRequest::all`] that only requires the hash of the
218//! hash sequence blob.
219//!
220//! ```rust
221//! # use iroh_blobs::protocol::{ChunkRanges, ChunkRangesExt, GetRequest};
222//! # let hash: iroh_blobs::Hash = [0; 32].into();
223//! let request = GetRequest::all(hash);
224//! ```
225//!
226//! ### Parts of hash sequences
227//!
228//! The most complex common case is when we have retrieved a hash seq and
229//! it's children, but were interrupted before we could retrieve all children.
230//!
231//! In this case we need to specify the hash seq we want to retrieve, but
232//! exclude the children and parts of children that we already have.
233//!
234//! For example, if we have a hash with 3 children, and we already have
235//! the first child and the first 1000000 chunks of the second child.
236//!
237//! We would create a [`GetRequest`] like this:
238//!
239//! ```rust
240//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt};
241//! # let hash: iroh_blobs::Hash = [0; 32].into();
242//! let request = GetRequest::builder()
243//! .child(1, ChunkRanges::chunks(1000000..)) // we don't need the first child;
244//! .next(ChunkRanges::all()) // we need the second child and all subsequent children completely
245//! .build_open(hash);
246//! ```
247//!
248//! ### Requesting chunks for each child
249//!
250//! The ChunkRangesSeq allows some scenarios that are not covered above. E.g. you
251//! might want to request a hash seq and the first chunk of each child blob to
252//! do something like mime type detection.
253//!
254//! You do not know how many children the collection has, so you need to use
255//! an infinite sequence.
256//!
257//! ```rust
258//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt, ChunkRangesSeq};
259//! # let hash: iroh_blobs::Hash = [0; 32].into();
260//! let request = GetRequest::builder()
261//! .root(ChunkRanges::all())
262//! .next(ChunkRanges::chunk(1)) // the first chunk of each child)
263//! .build_open(hash);
264//! ```
265//!
266//! ### Requesting a single child
267//!
268//! It is of course possible to request a single child of a collection. E.g.
269//! the following would download the second child of a collection:
270//!
271//! ```rust
272//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesExt};
273//! # let hash: iroh_blobs::Hash = [0; 32].into();
274//! let request = GetRequest::builder()
275//! .child(1, ChunkRanges::all()) // we need the second child completely
276//! .build(hash);
277//! ```
278//!
279//! However, if you already have the collection, you might as well locally
280//! look up the hash of the child and request it directly.
281//!
282//! ```rust
283//! # use iroh_blobs::protocol::{GetRequest, ChunkRanges, ChunkRangesSeq};
284//! # let child_hash: iroh_blobs::Hash = [0; 32].into();
285//! let request = GetRequest::blob(child_hash);
286//! ```
287//!
288//! ### Why ChunkRanges and ChunkRangesSeq?
289//!
290//! You might wonder why we have [`ChunkRangesSeq`], when a simple
291//! sequence of [`ChunkRanges`] might also do.
292//!
293//! The [`ChunkRangesSeq`] type exist to provide an efficient
294//! representation of the request on the wire. In the wire encoding of [`ChunkRangesSeq`],
295//! [`ChunkRanges`] are encoded alternating intervals of selected and non-selected chunks.
296//! This results in smaller numbers that will result in fewer bytes on the wire when using
297//! the [postcard](https://crates.io/crates/postcard) encoding format that uses variable
298//! length integers.
299//!
300//! Likewise, the [`ChunkRangesSeq`] type
301//! does run length encoding to remove repeating elements. It also allows infinite
302//! sequences of [`ChunkRanges`] to be encoded, unlike a simple sequence of
303//! [`ChunkRanges`]s.
304//!
305//! [`ChunkRangesSeq`] should be efficient even in case of very fragmented availability
306//! of chunks, like a download from multiple providers that was frequently interrupted.
307//!
308//! # Responses
309//!
310//! The response stream contains the bao encoded bytes for the requested data.
311//! The data will be sent in the order in which it was requested, so ascending
312//! chunks for each blob, and blobs in the order in which they appear in the
313//! hash seq.
314//!
315//! For details on the bao encoding, see the [bao specification](https://github.com/oconnor663/bao/blob/master/docs/spec.md)
316//! and the [bao-tree](https://crates.io/crates/bao-tree) crate. The bao-tree crate
317//! is identical to the bao crate, except that it allows combining multiple BLAKE3
318//! chunks to chunk groups for efficiency.
319//!
320//! For a complete response, the chunks are guaranteed to completely cover the
321//! requested ranges.
322//!
323//! Reasons for not retrieving a complete response are two-fold:
324//!
325//! - the connection to the provider was interrupted, or the provider encountered
326//! an internal error. In this case the provider will close the entire quinn connection.
327//!
328//! - the provider does not have the requested data, or discovered on send that the
329//! requested data is not valid.
330//!
331//! In this case the provider will close just the stream used to send the response.
332//! The exact location of the missing data can be retrieved from the error.
333//!
334//! # Requesting multiple unrelated blobs
335//!
336//! Let's say you don't have a hash sequence on the provider side, but you
337//! nevertheless want to request multiple unrelated blobs in a single request.
338//!
339//! For this, there is the [`GetManyRequest`] type, which also comes with a
340//! builder API.
341//!
342//! ```rust
343//! # use iroh_blobs::protocol::{GetManyRequest, ChunkRanges, ChunkRangesExt};
344//! # let hash1: iroh_blobs::Hash = [0; 32].into();
345//! # let hash2: iroh_blobs::Hash = [1; 32].into();
346//! GetManyRequest::builder()
347//! .hash(hash1, ChunkRanges::all())
348//! .hash(hash2, ChunkRanges::all())
349//! .build();
350//! ```
351//! If you accidentally or intentionally request ranges for the same hash
352//! multiple times, they will be merged into a single [`ChunkRanges`].
353//!
354//! ```rust
355//! # use iroh_blobs::protocol::{GetManyRequest, ChunkRanges, ChunkRangesExt};
356//! # let hash1: iroh_blobs::Hash = [0; 32].into();
357//! # let hash2: iroh_blobs::Hash = [1; 32].into();
358//! GetManyRequest::builder()
359//! .hash(hash1, ChunkRanges::chunk(1))
360//! .hash(hash2, ChunkRanges::all())
361//! .hash(hash1, ChunkRanges::last_chunk())
362//! .build();
363//! ```
364//!
365//! This is mostly useful for requesting multiple tiny blobs in a single request.
366//! For large or even medium sized blobs, multiple requests are not expensive.
367//! Multiple requests just create multiple streams on the same connection,
368//! which is *very* cheap in QUIC.
369//!
370//! In case nodes are permanently exchanging data, it is somewhat valuable to
371//! keep a connection open and reuse it for multiple requests. However, creating
372//! a new connection is also very cheap, so you would only do this to optimize
373//! a large existing system that has demonstrated performance issues.
374//!
375//! If in doubt, just use multiple requests and multiple connections.
376use std::{
377 io,
378 ops::{Bound, RangeBounds},
379};
380
381use bao_tree::{io::round_up_to_chunks, ChunkNum};
382use builder::GetRequestBuilder;
383use derive_more::From;
384use iroh::endpoint::VarInt;
385use irpc::util::AsyncReadVarintExt;
386use postcard::experimental::max_size::MaxSize;
387use range_collections::{range_set::RangeSetEntry, RangeSet2};
388use serde::{Deserialize, Serialize};
389mod range_spec;
390pub use bao_tree::ChunkRanges;
391pub use range_spec::{ChunkRangesSeq, NonEmptyRequestRangeSpecIter, RangeSpec};
392use snafu::{GenerateImplicitData, Snafu};
393use tokio::io::AsyncReadExt;
394
395use crate::{api::blobs::Bitfield, provider::RecvStreamExt, BlobFormat, Hash, HashAndFormat};
396
397/// Maximum message size is limited to 100MiB for now.
398pub const MAX_MESSAGE_SIZE: usize = 1024 * 1024;
399
400/// Error code for a permission error
401pub const ERR_PERMISSION: VarInt = VarInt::from_u32(1u32);
402/// Error code for when a request is aborted due to a rate limit
403pub const ERR_LIMIT: VarInt = VarInt::from_u32(2u32);
404/// Error code for when a request is aborted due to internal error
405pub const ERR_INTERNAL: VarInt = VarInt::from_u32(3u32);
406
407/// The ALPN used with quic for the iroh blobs protocol.
408pub const ALPN: &[u8] = b"/iroh-bytes/4";
409
410#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, From)]
411/// A request to the provider
412pub enum Request {
413 /// A get request for a blob or collection
414 Get(GetRequest),
415 Observe(ObserveRequest),
416 Slot2,
417 Slot3,
418 Slot4,
419 Slot5,
420 Slot6,
421 Slot7,
422 /// The inverse of a get request - push data to the provider
423 ///
424 /// Note that providers will in many cases reject this request, e.g. if
425 /// they don't have write access to the store or don't want to ingest
426 /// unknown data.
427 Push(PushRequest),
428 /// Get multiple blobs in a single request, from a single provider
429 ///
430 /// This is identical to a [`GetRequest`] for a [`crate::hashseq::HashSeq`], but the provider
431 /// does not need to have the hash seq.
432 GetMany(GetManyRequest),
433}
434
435/// This must contain the request types in the same order as the full requests
436#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, Copy, MaxSize)]
437pub enum RequestType {
438 Get,
439 Observe,
440 Slot2,
441 Slot3,
442 Slot4,
443 Slot5,
444 Slot6,
445 Slot7,
446 Push,
447 GetMany,
448}
449
450impl Request {
451 pub async fn read_async(reader: &mut iroh::endpoint::RecvStream) -> io::Result<(Self, usize)> {
452 let request_type = reader.read_u8().await?;
453 let request_type: RequestType = postcard::from_bytes(std::slice::from_ref(&request_type))
454 .map_err(|_| {
455 io::Error::new(
456 io::ErrorKind::InvalidData,
457 "failed to deserialize request type",
458 )
459 })?;
460 Ok(match request_type {
461 RequestType::Get => {
462 let (r, size) = reader
463 .read_to_end_as::<GetRequest>(MAX_MESSAGE_SIZE)
464 .await?;
465 (r.into(), size)
466 }
467 RequestType::GetMany => {
468 let (r, size) = reader
469 .read_to_end_as::<GetManyRequest>(MAX_MESSAGE_SIZE)
470 .await?;
471 (r.into(), size)
472 }
473 RequestType::Observe => {
474 let (r, size) = reader
475 .read_to_end_as::<ObserveRequest>(MAX_MESSAGE_SIZE)
476 .await?;
477 (r.into(), size)
478 }
479 RequestType::Push => {
480 let r = reader
481 .read_length_prefixed::<PushRequest>(MAX_MESSAGE_SIZE)
482 .await?;
483 let size = postcard::experimental::serialized_size(&r).unwrap();
484 (r.into(), size)
485 }
486 _ => {
487 return Err(io::Error::new(
488 io::ErrorKind::InvalidData,
489 "failed to deserialize request type",
490 ));
491 }
492 })
493 }
494}
495
496/// A get request
497#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, Hash)]
498pub struct GetRequest {
499 /// blake3 hash
500 pub hash: Hash,
501 /// The range of data to request
502 ///
503 /// The first element is the parent, all subsequent elements are children.
504 pub ranges: ChunkRangesSeq,
505}
506
507impl From<HashAndFormat> for GetRequest {
508 fn from(value: HashAndFormat) -> Self {
509 match value.format {
510 BlobFormat::Raw => Self::blob(value.hash),
511 BlobFormat::HashSeq => Self::all(value.hash),
512 }
513 }
514}
515
516impl GetRequest {
517 pub fn builder() -> GetRequestBuilder {
518 GetRequestBuilder::default()
519 }
520
521 pub fn content(&self) -> HashAndFormat {
522 HashAndFormat {
523 hash: self.hash,
524 format: if self.ranges.is_blob() {
525 BlobFormat::Raw
526 } else {
527 BlobFormat::HashSeq
528 },
529 }
530 }
531
532 /// Request a blob or collection with specified ranges
533 pub fn new(hash: Hash, ranges: ChunkRangesSeq) -> Self {
534 Self { hash, ranges }
535 }
536
537 /// Request a collection and all its children
538 pub fn all(hash: impl Into<Hash>) -> Self {
539 Self {
540 hash: hash.into(),
541 ranges: ChunkRangesSeq::all(),
542 }
543 }
544
545 /// Request just a single blob
546 pub fn blob(hash: impl Into<Hash>) -> Self {
547 Self {
548 hash: hash.into(),
549 ranges: ChunkRangesSeq::from_ranges([ChunkRanges::all()]),
550 }
551 }
552
553 /// Request ranges from a single blob
554 pub fn blob_ranges(hash: Hash, ranges: ChunkRanges) -> Self {
555 Self {
556 hash,
557 ranges: ChunkRangesSeq::from_ranges([ranges]),
558 }
559 }
560}
561
562/// A push request contains a description of what to push, but will be followed
563/// by the data to push.
564#[derive(
565 Deserialize, Serialize, Debug, PartialEq, Eq, Clone, derive_more::From, derive_more::Deref,
566)]
567pub struct PushRequest(GetRequest);
568
569impl PushRequest {
570 pub fn new(hash: Hash, ranges: ChunkRangesSeq) -> Self {
571 Self(GetRequest::new(hash, ranges))
572 }
573}
574
575/// A GetMany request is a request to get multiple blobs via a single request.
576///
577/// It is identical to a [`GetRequest`] for a HashSeq, but the HashSeq is provided
578/// by the requester.
579#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone)]
580pub struct GetManyRequest {
581 /// The hashes of the blobs to get
582 pub hashes: Vec<Hash>,
583 /// The ranges of data to request
584 ///
585 /// There is no range request for the parent, since we just sent the hashes
586 /// and therefore have the parent already.
587 pub ranges: ChunkRangesSeq,
588}
589
590impl<I: Into<Hash>> FromIterator<I> for GetManyRequest {
591 fn from_iter<T: IntoIterator<Item = I>>(iter: T) -> Self {
592 let mut res = iter.into_iter().map(Into::into).collect::<Vec<Hash>>();
593 res.sort();
594 res.dedup();
595 let n = res.len() as u64;
596 Self {
597 hashes: res,
598 ranges: ChunkRangesSeq(smallvec::smallvec![
599 (0, ChunkRanges::all()),
600 (n, ChunkRanges::empty())
601 ]),
602 }
603 }
604}
605
606impl GetManyRequest {
607 pub fn new(hashes: Vec<Hash>, ranges: ChunkRangesSeq) -> Self {
608 Self { hashes, ranges }
609 }
610
611 pub fn builder() -> builder::GetManyRequestBuilder {
612 builder::GetManyRequestBuilder::default()
613 }
614}
615
616/// A request to observe a raw blob bitfield.
617#[derive(Deserialize, Serialize, Debug, PartialEq, Eq, Clone, Hash)]
618pub struct ObserveRequest {
619 /// blake3 hash
620 pub hash: Hash,
621 /// ranges to observe.
622 pub ranges: RangeSpec,
623}
624
625impl ObserveRequest {
626 pub fn new(hash: Hash) -> Self {
627 Self {
628 hash,
629 ranges: RangeSpec::all(),
630 }
631 }
632}
633
634#[derive(Deserialize, Serialize, Debug, PartialEq, Eq)]
635pub struct ObserveItem {
636 pub size: u64,
637 pub ranges: ChunkRanges,
638}
639
640impl From<&Bitfield> for ObserveItem {
641 fn from(value: &Bitfield) -> Self {
642 Self {
643 size: value.size,
644 ranges: value.ranges.clone(),
645 }
646 }
647}
648
649impl From<&ObserveItem> for Bitfield {
650 fn from(value: &ObserveItem) -> Self {
651 Self {
652 size: value.size,
653 ranges: value.ranges.clone(),
654 }
655 }
656}
657
658/// Reasons to close connections or stop streams.
659///
660/// A QUIC **connection** can be *closed* and a **stream** can request the other side to
661/// *stop* sending data. Both closing and stopping have an associated `error_code`, closing
662/// also adds a `reason` as some arbitrary bytes.
663///
664/// This enum exists so we have a single namespace for `error_code`s used.
665#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
666#[repr(u16)]
667pub enum Closed {
668 /// The [`RecvStream`] was dropped.
669 ///
670 /// Used implicitly when a [`RecvStream`] is dropped without explicit call to
671 /// [`RecvStream::stop`]. We don't use this explicitly but this is here as
672 /// documentation as to what happened to `0`.
673 ///
674 /// [`RecvStream`]: iroh::endpoint::RecvStream
675 /// [`RecvStream::stop`]: iroh::endpoint::RecvStream::stop
676 StreamDropped = 0,
677 /// The provider is terminating.
678 ///
679 /// When a provider terminates all connections and associated streams are closed.
680 ProviderTerminating = 1,
681 /// The provider has received the request.
682 ///
683 /// Only a single request is allowed on a stream, if more data is received after this a
684 /// provider may send this error code in a STOP_STREAM frame.
685 RequestReceived = 2,
686}
687
688impl Closed {
689 /// The close reason as bytes. This is a valid utf8 string describing the reason.
690 pub fn reason(&self) -> &'static [u8] {
691 match self {
692 Closed::StreamDropped => b"stream dropped",
693 Closed::ProviderTerminating => b"provider terminating",
694 Closed::RequestReceived => b"request received",
695 }
696 }
697}
698
699impl From<Closed> for VarInt {
700 fn from(source: Closed) -> Self {
701 VarInt::from(source as u16)
702 }
703}
704
705/// Unknown error_code, can not be converted into [`Closed`].
706#[derive(Debug, Snafu)]
707#[snafu(display("Unknown error_code: {code}"))]
708pub struct UnknownErrorCode {
709 code: u64,
710 backtrace: Option<snafu::Backtrace>,
711}
712
713impl UnknownErrorCode {
714 pub(crate) fn new(code: u64) -> Self {
715 Self {
716 code,
717 backtrace: GenerateImplicitData::generate(),
718 }
719 }
720}
721
722impl TryFrom<VarInt> for Closed {
723 type Error = UnknownErrorCode;
724
725 fn try_from(value: VarInt) -> std::result::Result<Self, Self::Error> {
726 match value.into_inner() {
727 0 => Ok(Self::StreamDropped),
728 1 => Ok(Self::ProviderTerminating),
729 2 => Ok(Self::RequestReceived),
730 val => Err(UnknownErrorCode::new(val)),
731 }
732 }
733}
734
735pub trait ChunkRangesExt {
736 fn last_chunk() -> Self;
737 fn chunk(offset: u64) -> Self;
738 fn bytes(ranges: impl RangeBounds<u64>) -> Self;
739 fn chunks(ranges: impl RangeBounds<u64>) -> Self;
740 fn offset(offset: u64) -> Self;
741}
742
743impl ChunkRangesExt for ChunkRanges {
744 fn last_chunk() -> Self {
745 ChunkRanges::from(ChunkNum(u64::MAX)..)
746 }
747
748 /// Create a chunk range that contains a single chunk.
749 fn chunk(offset: u64) -> Self {
750 ChunkRanges::from(ChunkNum(offset)..ChunkNum(offset + 1))
751 }
752
753 /// Create a range of chunks that contains the given byte ranges.
754 /// The byte ranges are rounded up to the nearest chunk size.
755 fn bytes(ranges: impl RangeBounds<u64>) -> Self {
756 round_up_to_chunks(&bounds_from_range(ranges, |v| v))
757 }
758
759 /// Create a range of chunks from u64 chunk bounds.
760 ///
761 /// This is equivalent but more convenient than using the ChunkNum newtype.
762 fn chunks(ranges: impl RangeBounds<u64>) -> Self {
763 bounds_from_range(ranges, ChunkNum)
764 }
765
766 /// Create a chunk range that contains a single byte offset.
767 fn offset(offset: u64) -> Self {
768 Self::bytes(offset..offset + 1)
769 }
770}
771
772// todo: move to range_collections
773pub(crate) fn bounds_from_range<R, T, F>(range: R, f: F) -> RangeSet2<T>
774where
775 R: RangeBounds<u64>,
776 T: RangeSetEntry,
777 F: Fn(u64) -> T,
778{
779 let from = match range.start_bound() {
780 Bound::Included(start) => Some(*start),
781 Bound::Excluded(start) => {
782 let Some(start) = start.checked_add(1) else {
783 return RangeSet2::empty();
784 };
785 Some(start)
786 }
787 Bound::Unbounded => None,
788 };
789 let to = match range.end_bound() {
790 Bound::Included(end) => end.checked_add(1),
791 Bound::Excluded(end) => Some(*end),
792 Bound::Unbounded => None,
793 };
794 match (from, to) {
795 (Some(from), Some(to)) => RangeSet2::from(f(from)..f(to)),
796 (Some(from), None) => RangeSet2::from(f(from)..),
797 (None, Some(to)) => RangeSet2::from(..f(to)),
798 (None, None) => RangeSet2::all(),
799 }
800}
801
802pub mod builder {
803 use std::collections::BTreeMap;
804
805 use bao_tree::ChunkRanges;
806
807 use super::ChunkRangesSeq;
808 use crate::{
809 protocol::{GetManyRequest, GetRequest},
810 Hash,
811 };
812
813 #[derive(Debug, Clone, Default)]
814 pub struct ChunkRangesSeqBuilder {
815 ranges: BTreeMap<u64, ChunkRanges>,
816 }
817
818 #[derive(Debug, Clone, Default)]
819 pub struct GetRequestBuilder {
820 builder: ChunkRangesSeqBuilder,
821 }
822
823 impl GetRequestBuilder {
824 /// Add a range to the request.
825 pub fn offset(mut self, offset: u64, ranges: impl Into<ChunkRanges>) -> Self {
826 self.builder = self.builder.offset(offset, ranges);
827 self
828 }
829
830 /// Add a range to the request.
831 pub fn child(mut self, child: u64, ranges: impl Into<ChunkRanges>) -> Self {
832 self.builder = self.builder.offset(child + 1, ranges);
833 self
834 }
835
836 /// Specify ranges for the root blob (the HashSeq)
837 pub fn root(mut self, ranges: impl Into<ChunkRanges>) -> Self {
838 self.builder = self.builder.offset(0, ranges);
839 self
840 }
841
842 /// Specify ranges for the next offset.
843 pub fn next(mut self, ranges: impl Into<ChunkRanges>) -> Self {
844 self.builder = self.builder.next(ranges);
845 self
846 }
847
848 /// Build a get request for the given hash, with the ranges specified in the builder.
849 pub fn build(self, hash: impl Into<Hash>) -> GetRequest {
850 let ranges = self.builder.build();
851 GetRequest::new(hash.into(), ranges)
852 }
853
854 /// Build a get request for the given hash, with the ranges specified in the builder
855 /// and the last non-empty range repeating indefinitely.
856 pub fn build_open(self, hash: impl Into<Hash>) -> GetRequest {
857 let ranges = self.builder.build_open();
858 GetRequest::new(hash.into(), ranges)
859 }
860 }
861
862 impl ChunkRangesSeqBuilder {
863 /// Add a range to the request.
864 pub fn offset(self, offset: u64, ranges: impl Into<ChunkRanges>) -> Self {
865 self.at_offset(offset, ranges.into())
866 }
867
868 /// Specify ranges for the next offset.
869 pub fn next(self, ranges: impl Into<ChunkRanges>) -> Self {
870 let offset = self.next_offset_value();
871 self.at_offset(offset, ranges.into())
872 }
873
874 /// Build a get request for the given hash, with the ranges specified in the builder.
875 pub fn build(self) -> ChunkRangesSeq {
876 ChunkRangesSeq::from_ranges(self.build0())
877 }
878
879 /// Build a get request for the given hash, with the ranges specified in the builder
880 /// and the last non-empty range repeating indefinitely.
881 pub fn build_open(self) -> ChunkRangesSeq {
882 ChunkRangesSeq::from_ranges_infinite(self.build0())
883 }
884
885 /// Add ranges at the given offset.
886 fn at_offset(mut self, offset: u64, ranges: ChunkRanges) -> Self {
887 self.ranges
888 .entry(offset)
889 .and_modify(|v| *v |= ranges.clone())
890 .or_insert(ranges);
891 self
892 }
893
894 /// Build the request.
895 fn build0(mut self) -> impl Iterator<Item = ChunkRanges> {
896 let mut ranges = Vec::new();
897 self.ranges.retain(|_, v| !v.is_empty());
898 let until_key = self.next_offset_value();
899 for offset in 0..until_key {
900 ranges.push(self.ranges.remove(&offset).unwrap_or_default());
901 }
902 ranges.into_iter()
903 }
904
905 /// Get the next offset value.
906 fn next_offset_value(&self) -> u64 {
907 self.ranges
908 .last_key_value()
909 .map(|(k, _)| *k + 1)
910 .unwrap_or_default()
911 }
912 }
913
914 #[derive(Debug, Clone, Default)]
915 pub struct GetManyRequestBuilder {
916 ranges: BTreeMap<Hash, ChunkRanges>,
917 }
918
919 impl GetManyRequestBuilder {
920 /// Specify ranges for the given hash.
921 ///
922 /// Note that if you specify a hash that is already in the request, the ranges will be
923 /// merged with the existing ranges.
924 pub fn hash(mut self, hash: impl Into<Hash>, ranges: impl Into<ChunkRanges>) -> Self {
925 let ranges = ranges.into();
926 let hash = hash.into();
927 self.ranges
928 .entry(hash)
929 .and_modify(|v| *v |= ranges.clone())
930 .or_insert(ranges);
931 self
932 }
933
934 /// Build a `GetManyRequest`.
935 pub fn build(self) -> GetManyRequest {
936 let (hashes, ranges): (Vec<Hash>, Vec<ChunkRanges>) = self
937 .ranges
938 .into_iter()
939 .filter(|(_, v)| !v.is_empty())
940 .unzip();
941 let ranges = ChunkRangesSeq::from_ranges(ranges);
942 GetManyRequest { hashes, ranges }
943 }
944 }
945
946 #[cfg(test)]
947 mod tests {
948 use bao_tree::ChunkNum;
949
950 use super::*;
951 use crate::protocol::{ChunkRangesExt, GetManyRequest};
952
953 #[test]
954 fn chunk_ranges_ext() {
955 let ranges = ChunkRanges::bytes(1..2)
956 | ChunkRanges::chunks(100..=200)
957 | ChunkRanges::offset(1024 * 10)
958 | ChunkRanges::chunk(1024)
959 | ChunkRanges::last_chunk();
960 assert_eq!(
961 ranges,
962 ChunkRanges::from(ChunkNum(0)..ChunkNum(1)) // byte range 1..2
963 | ChunkRanges::from(ChunkNum(10)..ChunkNum(11)) // chunk at byte offset 1024*10
964 | ChunkRanges::from(ChunkNum(100)..ChunkNum(201)) // chunk range 100..=200
965 | ChunkRanges::from(ChunkNum(1024)..ChunkNum(1025)) // chunk 1024
966 | ChunkRanges::last_chunk() // last chunk
967 );
968 }
969
970 #[test]
971 fn get_request_builder() {
972 let hash = [0; 32];
973 let request = GetRequest::builder()
974 .root(ChunkRanges::all())
975 .next(ChunkRanges::all())
976 .next(ChunkRanges::bytes(0..100))
977 .build(hash);
978 assert_eq!(request.hash.as_bytes(), &hash);
979 assert_eq!(
980 request.ranges,
981 ChunkRangesSeq::from_ranges([
982 ChunkRanges::all(),
983 ChunkRanges::all(),
984 ChunkRanges::from(..ChunkNum(1)),
985 ])
986 );
987
988 let request = GetRequest::builder()
989 .root(ChunkRanges::all())
990 .child(2, ChunkRanges::bytes(0..100))
991 .build(hash);
992 assert_eq!(request.hash.as_bytes(), &hash);
993 assert_eq!(
994 request.ranges,
995 ChunkRangesSeq::from_ranges([
996 ChunkRanges::all(), // root
997 ChunkRanges::empty(), // child 0
998 ChunkRanges::empty(), // child 1
999 ChunkRanges::from(..ChunkNum(1)) // child 2,
1000 ])
1001 );
1002
1003 let request = GetRequest::builder()
1004 .root(ChunkRanges::all())
1005 .next(ChunkRanges::bytes(0..1024) | ChunkRanges::last_chunk())
1006 .build_open(hash);
1007 assert_eq!(request.hash.as_bytes(), &[0; 32]);
1008 assert_eq!(
1009 request.ranges,
1010 ChunkRangesSeq::from_ranges_infinite([
1011 ChunkRanges::all(),
1012 ChunkRanges::from(..ChunkNum(1)) | ChunkRanges::last_chunk(),
1013 ])
1014 );
1015 }
1016
1017 #[test]
1018 fn get_many_request_builder() {
1019 let hash1 = [0; 32];
1020 let hash2 = [1; 32];
1021 let hash3 = [2; 32];
1022 let request = GetManyRequest::builder()
1023 .hash(hash1, ChunkRanges::all())
1024 .hash(hash2, ChunkRanges::empty()) // will be ignored!
1025 .hash(hash3, ChunkRanges::bytes(0..100))
1026 .build();
1027 assert_eq!(
1028 request.hashes,
1029 vec![Hash::from([0; 32]), Hash::from([2; 32])]
1030 );
1031 assert_eq!(
1032 request.ranges,
1033 ChunkRangesSeq::from_ranges([
1034 ChunkRanges::all(), // hash 0
1035 ChunkRanges::from(..ChunkNum(1)), // hash 2
1036 ])
1037 );
1038 }
1039 }
1040}
1041
1042#[cfg(test)]
1043mod tests {
1044 use iroh_test::{assert_eq_hex, hexdump::parse_hexdump};
1045 use postcard::experimental::max_size::MaxSize;
1046
1047 use super::{GetRequest, Request, RequestType};
1048 use crate::Hash;
1049
1050 #[test]
1051 fn request_wire_format() {
1052 let hash: Hash = [0xda; 32].into();
1053 let cases = [
1054 (
1055 Request::from(GetRequest::blob(hash)),
1056 r"
1057 00 # enum variant for GetRequest
1058 dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
1059 020001000100 # the ChunkRangesSeq
1060 ",
1061 ),
1062 (
1063 Request::from(GetRequest::all(hash)),
1064 r"
1065 00 # enum variant for GetRequest
1066 dadadadadadadadadadadadadadadadadadadadadadadadadadadadadadadada # the hash
1067 01000100 # the ChunkRangesSeq
1068 ",
1069 ),
1070 ];
1071 for (case, expected_hex) in cases {
1072 let expected = parse_hexdump(expected_hex).unwrap();
1073 let bytes = postcard::to_stdvec(&case).unwrap();
1074 assert_eq_hex!(bytes, expected);
1075 }
1076 }
1077
1078 #[test]
1079 fn request_type_size() {
1080 assert_eq!(RequestType::POSTCARD_MAX_SIZE, 1);
1081 }
1082}