bao_tree/lib.rs
1//! # Efficient BLAKE3 based verified streaming
2//!
3//! This crate is similar to the [bao crate](https://crates.io/crates/bao), but
4//! takes a slightly different approach.
5//!
6//! The core struct is [BaoTree], which describes the geometry of the tree and
7//! various ways to traverse it. An individual tree node is identified by
8//! [TreeNode], which is just a newtype wrapper for an u64.
9//!
10//! [TreeNode] provides various helpers to e.g. get the offset of a node in
11//! different traversal orders.
12//!
13//! There are newtypes for the different kinds of integers used in the
14//! tree:
15//! [ChunkNum] is an u64 number of chunks,
16//! [TreeNode] is an u64 tree node identifier,
17//! and [BlockSize] is the log base 2 of the chunk group size.
18//!
19//! All this is then used in the [io] module to implement the actual io, both
20//! [synchronous](io::sync) and [asynchronous](io::tokio).
21//!
22//! # Basic usage
23//!
24//! The basic workflow is like this: you have some existing data, for which
25//! you want to enable verified streaming. This data can either be in memory,
26//! in a file, or even a remote resource such as an HTTP server.
27//!
28//! ## Outboard creation
29//!
30//! You create an outboard using the [CreateOutboard](io::sync::CreateOutboard)
31//! trait. It has functions to [create](io::sync::CreateOutboard::create) an
32//! outboard from scratch or to [initialize](io::sync::CreateOutboard::init_from)
33//! data and root hash from existing data.
34//!
35//! ## Serving requests
36//!
37//! You serve streaming requests by using the
38//! [encode_ranges](io::sync::encode_ranges) or
39//! [encode_ranges_validated](io::sync::encode_ranges_validated) functions
40//! in the sync or async io module. For this you need data and a matching
41//! outboard.
42//!
43//! The difference between the two functions is that the validated version
44//! will check the hash of each chunk against the bao tree encoded in the
45//! outboard, so you will detect data corruption before sending out data
46//! to the requester. When using the unvalidated version, you might send out
47//! corrupted data without ever noticing and earn a bad reputation.
48//!
49//! Due to the speed of the blake3 hash function, validation is not a
50//! significant performance overhead compared to network operations and
51//! encryption.
52//!
53//! The requester will send a set of chunk ranges they are interested in.
54//! To compute chunk ranges from byte ranges, there is a helper function
55//! [round_up_to_chunks](io::round_up_to_chunks) that takes a byte range and
56//! rounds up to chunk ranges.
57//!
58//! If you just want to stream the entire blob, you can use [ChunkRanges::all]
59//! as the range.
60//!
61//! ## Processing requests
62//!
63//! You process requests by using the [decode_ranges](io::sync::decode_ranges)
64//! function in the sync or async io module. This function requires prior
65//! knowledge of the tree geometry (total data size and block size). A common
66//! way to get this information is to have the block size as a common parameter
67//! of both sides, and send the total data size as a prefix of the encoded data.
68//! E.g. the original bao crate uses a little endian u64 as the prefix.
69//!
70//! This function will perform validation in any case, there is no variant
71//! that skips validation since that would defeat the purpose of verified
72//! streaming.
73//!
74//! ## Simple end to end example
75//!
76//! ```no_run
77//! use std::io;
78//!
79//! use bao_tree::{
80//! io::{
81//! outboard::PreOrderOutboard,
82//! round_up_to_chunks,
83//! sync::{decode_ranges, encode_ranges_validated, valid_ranges, CreateOutboard},
84//! },
85//! BlockSize, ByteRanges, ChunkRanges,
86//! };
87//!
88//! /// Use a block size of 16 KiB, a good default for most cases
89//! const BLOCK_SIZE: BlockSize = BlockSize::from_chunk_log(4);
90//!
91//! # fn main() -> io::Result<()> {
92//! // The file we want to serve
93//! let file = std::fs::File::open("video.mp4")?;
94//! // Create an outboard for the file, using the current size
95//! let ob = PreOrderOutboard::<Vec<u8>>::create(&file, BLOCK_SIZE)?;
96//! // Encode the first 100000 bytes of the file
97//! let ranges = ByteRanges::from(0..100000);
98//! let ranges = round_up_to_chunks(&ranges);
99//! // Stream of data to client. Needs to implement `io::Write`. We just use a vec here.
100//! let mut to_client = vec![];
101//! encode_ranges_validated(&file, &ob, &ranges, &mut to_client)?;
102//!
103//! // Stream of data from client. Needs to implement `io::Read`. We just wrap the vec in a cursor.
104//! let from_server = io::Cursor::new(to_client);
105//! let root = ob.root;
106//! let tree = ob.tree;
107//!
108//! // Decode the encoded data into a file
109//! let mut decoded = std::fs::File::create("copy.mp4")?;
110//! let mut ob = PreOrderOutboard {
111//! tree,
112//! root,
113//! data: vec![],
114//! };
115//! decode_ranges(from_server, &ranges, &mut decoded, &mut ob)?;
116//!
117//! // the first 100000 bytes of the file should now be in `decoded`
118//! // in addition, the required part of the tree to validate that the data is
119//! // correct are in `ob.data`
120//!
121//! // Print the valid ranges of the file
122//! for range in valid_ranges(&ob, &decoded, &ChunkRanges::all()) {
123//! println!("{:?}", range);
124//! }
125//! # Ok(())
126//! # }
127//! ```
128//!
129//! # Async end to end example
130//!
131//! The async version is very similar to the sync version, except that it needs
132//! an async context. All functions that do IO are async. The file has to be
133//! an [iroh_io::File], which is just a wrapper for [std::fs::File] that implements
134//! async random access via the [AsyncSliceReader](iroh_io::AsyncSliceReader) trait.
135//!
136//! We use [futures_lite] crate, but using the normal futures crate will also work.
137//!
138//! ```no_run
139//! use std::io;
140//!
141//! use bao_tree::{
142//! io::{
143//! fsm::{decode_ranges, encode_ranges_validated, valid_ranges, CreateOutboard},
144//! outboard::PreOrderOutboard,
145//! round_up_to_chunks,
146//! },
147//! BlockSize, ByteRanges, ChunkRanges,
148//! };
149//! use bytes::BytesMut;
150//! use futures_lite::StreamExt;
151//!
152//! /// Use a block size of 16 KiB, a good default for most cases
153//! const BLOCK_SIZE: BlockSize = BlockSize::from_chunk_log(4);
154//!
155//! # #[tokio::main]
156//! # async fn main() -> io::Result<()> {
157//! // The file we want to serve
158//! let mut file = iroh_io::File::open("video.mp4".into()).await?;
159//! // Create an outboard for the file, using the current size
160//! let mut ob = PreOrderOutboard::<BytesMut>::create(&mut file, BLOCK_SIZE).await?;
161//! // Encode the first 100000 bytes of the file
162//! let ranges = ByteRanges::from(0..100000);
163//! let ranges = round_up_to_chunks(&ranges);
164//! // Stream of data to client. Needs to implement `io::Write`. We just use a vec here.
165//! let mut to_client = Vec::new();
166//! encode_ranges_validated(file, &mut ob, &ranges, &mut to_client).await?;
167//!
168//! // Stream of data from client. Needs to implement `io::Read`. We just wrap the vec in a cursor.
169//! let from_server = io::Cursor::new(to_client.as_slice());
170//! let root = ob.root;
171//! let tree = ob.tree;
172//!
173//! // Decode the encoded data into a file
174//! let mut decoded = iroh_io::File::open("copy.mp4".into()).await?;
175//! let mut ob = PreOrderOutboard {
176//! tree,
177//! root,
178//! data: BytesMut::new(),
179//! };
180//! decode_ranges(from_server, ranges, &mut decoded, &mut ob).await?;
181//!
182//! // the first 100000 bytes of the file should now be in `decoded`
183//! // in addition, the required part of the tree to validate that the data is
184//! // correct are in `ob.data`
185//!
186//! // Print the valid ranges of the file
187//! let ranges = ChunkRanges::all();
188//! let mut stream = valid_ranges(&mut ob, &mut decoded, &ranges);
189//! while let Some(range) = stream.next().await {
190//! println!("{:?}", range);
191//! }
192//! # Ok(())
193//! # }
194//! ```
195//!
196//! # Compatibility with the [bao crate](https://crates.io/crates/bao)
197//!
198//! This crate will be compatible with the bao crate, provided you do the
199//! following:
200//!
201//! - use a block size of 1024, so no chunk groups
202//! - use a little endian u64 as the prefix for the encoded data
203//! - use only a single range
204#![deny(missing_docs)]
205use std::{
206 fmt::{self, Debug},
207 ops::Range,
208};
209
210use range_collections::RangeSetRef;
211pub mod iter;
212mod rec;
213mod tree;
214use iter::*;
215pub use tree::{BlockSize, ChunkNum};
216pub mod io;
217pub use blake3;
218
219#[cfg(all(test, feature = "tokio_fsm"))]
220mod tests;
221#[cfg(all(test, feature = "tokio_fsm"))]
222mod tests2;
223
224/// A set of chunk ranges
225pub type ChunkRanges = range_collections::RangeSet2<ChunkNum>;
226
227/// A set of byte ranges
228pub type ByteRanges = range_collections::RangeSet2<u64>;
229
230/// A referenceable set of chunk ranges
231///
232/// [ChunkRanges] implements [`AsRef<ChunkRangesRef>`].
233pub type ChunkRangesRef = range_collections::RangeSetRef<ChunkNum>;
234
235fn hash_subtree(start_chunk: u64, data: &[u8], is_root: bool) -> blake3::Hash {
236 use blake3::hazmat::{ChainingValue, HasherExt};
237 if is_root {
238 debug_assert!(start_chunk == 0);
239 blake3::hash(data)
240 } else {
241 let mut hasher = blake3::Hasher::new();
242 hasher.set_input_offset(start_chunk * 1024);
243 hasher.update(data);
244 let non_root_hash: ChainingValue = hasher.finalize_non_root();
245 blake3::Hash::from(non_root_hash)
246 }
247}
248
249fn parent_cv(left_child: &blake3::Hash, right_child: &blake3::Hash, is_root: bool) -> blake3::Hash {
250 use blake3::hazmat::{merge_subtrees_non_root, merge_subtrees_root, ChainingValue, Mode};
251 let left_child: ChainingValue = *left_child.as_bytes();
252 let right_child: ChainingValue = *right_child.as_bytes();
253 if is_root {
254 merge_subtrees_root(&left_child, &right_child, Mode::Hash)
255 } else {
256 blake3::Hash::from(merge_subtrees_non_root(
257 &left_child,
258 &right_child,
259 Mode::Hash,
260 ))
261 }
262}
263
264/// Defines a Bao tree.
265///
266/// This is just the specification of the tree, it does not contain any actual data.
267///
268/// Usually trees are self-contained. This means that the tree starts at chunk 0,
269/// and the hash of the root node is computed with the is_root flag set to true.
270///
271/// For some internal use, it is also possible to create trees that are just subtrees
272/// of a larger tree. In this case, the start_chunk is the chunk number of the first
273/// chunk in the tree, and the is_root flag can be false.
274#[derive(Debug, Clone, Copy, PartialEq, Eq)]
275pub struct BaoTree {
276 /// Total number of bytes in the file
277 size: u64,
278 /// Log base 2 of the chunk group size
279 block_size: BlockSize,
280}
281
282/// An offset of a node in a post-order outboard
283#[derive(Debug, Clone, Copy)]
284pub enum PostOrderOffset {
285 /// the node is stable and won't change when appending data
286 Stable(u64),
287 /// the node is unstable and will change when appending data
288 Unstable(u64),
289}
290
291impl PostOrderOffset {
292 /// Just get the offset value, ignoring whether it's stable or unstable
293 pub fn value(self) -> u64 {
294 match self {
295 Self::Stable(n) => n,
296 Self::Unstable(n) => n,
297 }
298 }
299}
300
301impl BaoTree {
302 /// Create a new self contained BaoTree
303 pub fn new(size: u64, block_size: BlockSize) -> Self {
304 Self { size, block_size }
305 }
306
307 /// The size of the blob from which this tree was constructed, in bytes
308 pub fn size(&self) -> u64 {
309 self.size
310 }
311
312 /// The block size of the tree
313 pub fn block_size(&self) -> BlockSize {
314 self.block_size
315 }
316
317 /// Given a tree of size `size` and block size `block_size`,
318 /// compute the root node and the number of nodes for a shifted tree.
319 pub(crate) fn shifted(&self) -> (TreeNode, TreeNode) {
320 let level = self.block_size.0;
321 let size = self.size;
322 let shift = 10 + level;
323 let mask = (1 << shift) - 1;
324 // number of full blocks of size 1024 << level
325 let full_blocks = size >> shift;
326 // 1 if the last block is non zero, 0 otherwise
327 let open_block = ((size & mask) != 0) as u64;
328 // total number of blocks, rounding up to 1 if there are no blocks
329 let blocks = (full_blocks + open_block).max(1);
330 let n = blocks.div_ceil(2);
331 // root node
332 let root = n.next_power_of_two() - 1;
333 // number of nodes in the tree
334 let filled_size = n + n.saturating_sub(1);
335 (TreeNode(root), TreeNode(filled_size))
336 }
337
338 fn byte_range(&self, node: TreeNode) -> Range<u64> {
339 let start = node.chunk_range().start.to_bytes();
340 let end = node.chunk_range().end.to_bytes();
341 start..end.min(self.size)
342 }
343
344 /// Compute the byte ranges for a leaf node
345 ///
346 /// Returns two ranges, the first is the left range, the second is the right range
347 /// If the leaf is partially contained in the tree, the right range will be empty
348 fn leaf_byte_ranges3(&self, leaf: TreeNode) -> (u64, u64, u64) {
349 let Range { start, end } = leaf.byte_range();
350 let mid = leaf.mid().to_bytes();
351 if !(start < self.size || (start == 0 && self.size == 0)) {
352 debug_assert!(start < self.size || (start == 0 && self.size == 0));
353 }
354 (start, mid.min(self.size), end.min(self.size))
355 }
356
357 /// Traverse the entire tree in post order as [BaoChunk]s
358 ///
359 /// This iterator is used by both the sync and async io code for computing
360 /// an outboard from existing data
361 pub fn post_order_chunks_iter(&self) -> PostOrderChunkIter {
362 PostOrderChunkIter::new(*self)
363 }
364
365 /// Traverse the part of the tree that is relevant for a ranges query
366 /// in pre order as [BaoChunk]s
367 ///
368 /// This iterator is used by both the sync and async io code for encoding
369 /// from an outboard and ranges as well as decoding an encoded stream.
370 pub fn ranges_pre_order_chunks_iter_ref<'a>(
371 &self,
372 ranges: &'a RangeSetRef<ChunkNum>,
373 min_level: u8,
374 ) -> PreOrderPartialChunkIterRef<'a> {
375 PreOrderPartialChunkIterRef::new(*self, ranges, min_level)
376 }
377
378 /// Traverse the entire tree in post order as [TreeNode]s,
379 /// down to the level given by the block size.
380 pub fn post_order_nodes_iter(&self) -> impl Iterator<Item = TreeNode> {
381 let (root, len) = self.shifted();
382 let shift = self.block_size.0;
383 PostOrderNodeIter::new(root, len).map(move |x| x.subtract_block_size(shift))
384 }
385
386 /// Traverse the entire tree in pre order as [TreeNode]s,
387 /// down to the level given by the block size.
388 pub fn pre_order_nodes_iter(&self) -> impl Iterator<Item = TreeNode> {
389 let (root, len) = self.shifted();
390 let shift = self.block_size.0;
391 PreOrderNodeIter::new(root, len).map(move |x| x.subtract_block_size(shift))
392 }
393
394 /// Traverse the part of the tree that is relevant for a ranges querys
395 /// in pre order as [NodeInfo]s
396 ///
397 /// This is mostly used internally.
398 ///
399 /// When `min_level` is set to a value greater than 0, the iterator will
400 /// skip all branch nodes that are at a level < min_level if they are fully
401 /// covered by the ranges.
402 #[cfg(test)]
403 pub fn ranges_pre_order_nodes_iter<'a>(
404 &self,
405 ranges: &'a RangeSetRef<ChunkNum>,
406 min_level: u8,
407 ) -> PreOrderPartialIterRef<'a> {
408 PreOrderPartialIterRef::new(*self, ranges, min_level)
409 }
410
411 /// Root of the tree
412 ///
413 /// Does not consider block size
414 pub fn root(&self) -> TreeNode {
415 let shift = 10;
416 let mask = (1 << shift) - 1;
417 let full_blocks = self.size >> shift;
418 let open_block = ((self.size & mask) != 0) as u64;
419 let blocks = (full_blocks + open_block).max(1);
420 let chunks = ChunkNum(blocks);
421 TreeNode::root(chunks)
422 }
423
424 /// Number of blocks in the tree
425 ///
426 /// At chunk group size 1, this is the same as the number of chunks
427 /// Even a tree with 0 bytes size has a single block
428 pub fn blocks(&self) -> u64 {
429 // handle the case of an empty tree having 1 block
430 blocks(self.size, self.block_size).max(1)
431 }
432
433 /// Number of chunks in the tree
434 pub fn chunks(&self) -> ChunkNum {
435 ChunkNum::chunks(self.size)
436 }
437
438 /// Number of hash pairs in the outboard
439 fn outboard_hash_pairs(&self) -> u64 {
440 self.blocks() - 1
441 }
442
443 /// The outboard size for this tree.
444 ///
445 /// This is the outboard size *without* the size prefix.
446 pub fn outboard_size(&self) -> u64 {
447 self.outboard_hash_pairs() * 64
448 }
449
450 #[allow(dead_code)]
451 fn filled_size(&self) -> TreeNode {
452 let blocks = self.chunks();
453 let n = blocks.0.div_ceil(2);
454 TreeNode(n + n.saturating_sub(1))
455 }
456
457 /// true if the node is a leaf for this tree
458 ///
459 /// If a tree has a non-zero block size, this is different than the node
460 /// being a leaf (level=0).
461 #[cfg(test)]
462 const fn is_leaf(&self, node: TreeNode) -> bool {
463 node.level() == self.block_size.to_u32()
464 }
465
466 /// true if the given node is persisted
467 ///
468 /// the only node that is not persisted is the last leaf node, if it is
469 /// less than half full
470 #[inline]
471 #[cfg(test)]
472 const fn is_persisted(&self, node: TreeNode) -> bool {
473 !self.is_leaf(node) || node.mid().to_bytes() < self.size
474 }
475
476 /// true if this is a node that is relevant for the outboard
477 #[inline]
478 const fn is_relevant_for_outboard(&self, node: TreeNode) -> bool {
479 let level = node.level();
480 if level < self.block_size.to_u32() {
481 // too small, this outboard does not track it
482 false
483 } else if level > self.block_size.to_u32() {
484 // a parent node, always relevant
485 true
486 } else {
487 node.mid().to_bytes() < self.size
488 }
489 }
490
491 /// The offset of the given node in the pre order traversal
492 pub fn pre_order_offset(&self, node: TreeNode) -> Option<u64> {
493 // if the node has a level less than block_size, this will return None
494 let shifted = node.add_block_size(self.block_size.0)?;
495 let is_half_leaf = shifted.is_leaf() && node.mid().to_bytes() >= self.size;
496 if !is_half_leaf {
497 let (_, tree_filled_size) = self.shifted();
498 Some(pre_order_offset_loop(shifted.0, tree_filled_size.0))
499 } else {
500 None
501 }
502 }
503
504 /// The offset of the given node in the post order traversal
505 pub fn post_order_offset(&self, node: TreeNode) -> Option<PostOrderOffset> {
506 // if the node has a level less than block_size, this will return None
507 let shifted = node.add_block_size(self.block_size.0)?;
508 if node.byte_range().end <= self.size {
509 // stable node, use post_order_offset
510 Some(PostOrderOffset::Stable(shifted.post_order_offset()))
511 } else {
512 // unstable node
513 if shifted.is_leaf() && node.mid().to_bytes() >= self.size {
514 // half full leaf node, not considered
515 None
516 } else {
517 // compute the offset based on the total size and the height of the node
518 self.outboard_hash_pairs()
519 .checked_sub(u64::from(node.right_count()) + 1)
520 .map(PostOrderOffset::Unstable)
521 }
522 }
523 }
524
525 const fn chunk_group_chunks(&self) -> ChunkNum {
526 ChunkNum(1 << self.block_size.0)
527 }
528
529 fn chunk_group_bytes(&self) -> usize {
530 self.chunk_group_chunks().to_bytes().try_into().unwrap()
531 }
532}
533
534/// number of blocks that this number of bytes covers,
535/// given a block size
536pub(crate) const fn blocks(size: u64, block_size: BlockSize) -> u64 {
537 let chunk_group_log = block_size.0;
538 let block_bits = chunk_group_log + 10;
539 let block_mask = (1 << block_bits) - 1;
540 let full_blocks = size >> block_bits;
541 let open_block = ((size & block_mask) != 0) as u64;
542 full_blocks + open_block
543}
544
545/// An u64 that defines a node in a bao tree.
546///
547/// You typically don't have to use this, but it can be useful for debugging
548/// and error handling. Hash validation errors contain a `TreeNode` that allows
549/// you to find the position where validation failed.
550#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
551#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
552pub struct TreeNode(u64);
553
554impl fmt::Display for TreeNode {
555 fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
556 write!(f, "{}", self.0)
557 }
558}
559
560impl fmt::Debug for TreeNode {
561 fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
562 if !f.alternate() {
563 write!(f, "TreeNode({})", self.0)
564 } else if self.is_leaf() {
565 write!(f, "TreeNode::Leaf({})", self.0)
566 } else {
567 write!(f, "TreeNode::Branch({}, level={})", self.0, self.level())
568 }
569 }
570}
571
572impl TreeNode {
573 /// Create a new tree node from a start chunk and a level
574 ///
575 /// The start chunk must be the start of a subtree with the given level.
576 /// So for level 0, the start chunk must even. For level 1, the start chunk
577 /// must be divisible by 4, etc.
578 ///
579 /// This is a bridge from the recursive reference implementation to the node
580 /// based implementations, and is therefore only used in tests.
581 #[cfg(all(test, feature = "tokio_fsm"))]
582 fn from_start_chunk_and_level(start_chunk: ChunkNum, level: BlockSize) -> Self {
583 let start_chunk = start_chunk.0;
584 let level = level.0;
585 // check that the start chunk a start of a subtree with level `level`
586 // this ensures that there is a 0 at bit `level`.
587 let check_mask = (1 << (level + 1)) - 1;
588 debug_assert_eq!(start_chunk & check_mask, 0);
589 let level_mask = (1 << level) - 1;
590 // set the trailing `level` bits to 1.
591 // The level is the number of trailing ones.
592 Self(start_chunk | level_mask)
593 }
594
595 /// Given a number of blocks, gives root node
596 fn root(chunks: ChunkNum) -> TreeNode {
597 Self(chunks.0.div_ceil(2).next_power_of_two() - 1)
598 }
599
600 /// the middle of the tree node, in blocks
601 pub const fn mid(&self) -> ChunkNum {
602 ChunkNum(self.0 + 1)
603 }
604
605 #[inline]
606 const fn half_span(&self) -> u64 {
607 1 << self.level()
608 }
609
610 /// The level of the node in the tree, 0 for leafs.
611 #[inline]
612 pub const fn level(&self) -> u32 {
613 self.0.trailing_ones()
614 }
615
616 /// True if this is a leaf node.
617 #[inline]
618 pub const fn is_leaf(&self) -> bool {
619 (self.0 & 1) == 0
620 }
621
622 /// Convert a node to a node in a tree with a smaller block size
623 ///
624 /// E.g. a leaf node in a tree with block size 4 will become a node
625 /// with level 4 in a tree with block size 0.
626 ///
627 /// This works by just adding n trailing 1 bits to the node by shifting
628 /// to the left.
629 #[inline]
630 pub const fn subtract_block_size(&self, n: u8) -> Self {
631 let shifted = !(!self.0 << n);
632 Self(shifted)
633 }
634
635 /// Convert a node to a node in a tree with a larger block size
636 ///
637 /// If the nodes has n trailing 1 bits, they are removed by shifting
638 /// the node to the right by n bits.
639 ///
640 /// If the node has less than n trailing 1 bits, the node is too small
641 /// to be represented in the target tree.
642 #[inline]
643 pub const fn add_block_size(&self, n: u8) -> Option<Self> {
644 let mask = (1 << n) - 1;
645 // check if the node has a high enough level
646 if self.0 & mask == mask {
647 Some(Self(self.0 >> n))
648 } else {
649 None
650 }
651 }
652
653 /// Range of blocks that this node covers, given a block size
654 ///
655 /// Note that this will give the untruncated range, which may be larger than
656 /// the actual tree. To get the exact byte range for a tree, use
657 /// [BaoTree::byte_range];
658 fn byte_range(&self) -> Range<u64> {
659 let range = self.chunk_range();
660 range.start.to_bytes()..range.end.to_bytes()
661 }
662
663 /// Number of nodes below this node, excluding this node.
664 #[inline]
665 pub const fn count_below(&self) -> u64 {
666 // go to representation where trailing zeros are the level
667 let x = self.0 + 1;
668 // isolate the lowest bit
669 let lowest_bit = x & (-(x as i64) as u64);
670 // number of nodes is n * 2 - 1, subtract 1 for the node itself
671 lowest_bit * 2 - 2
672 }
673
674 /// Get the next left ancestor of this node, or None if there is none.
675 pub fn next_left_ancestor(&self) -> Option<Self> {
676 self.next_left_ancestor0().map(Self)
677 }
678
679 /// Get the left child of this node, or None if it is a child node.
680 pub fn left_child(&self) -> Option<Self> {
681 let offset = 1 << self.level().checked_sub(1)?;
682 Some(Self(self.0 - offset))
683 }
684
685 /// Get the right child of this node, or None if it is a child node.
686 pub fn right_child(&self) -> Option<Self> {
687 let offset = 1 << self.level().checked_sub(1)?;
688 Some(Self(self.0 + offset))
689 }
690
691 /// Unrestricted parent, can only be None if we are at the top
692 pub fn parent(&self) -> Option<Self> {
693 let level = self.level();
694 if level == 63 {
695 return None;
696 }
697 let span = 1u64 << level;
698 let offset = self.0;
699 Some(Self(if (offset & (span * 2)) == 0 {
700 offset + span
701 } else {
702 offset - span
703 }))
704 }
705
706 /// Restricted parent, will be None if we call parent on the root
707 pub fn restricted_parent(&self, len: Self) -> Option<Self> {
708 let mut curr = *self;
709 while let Some(parent) = curr.parent() {
710 if parent.0 < len.0 {
711 return Some(parent);
712 }
713 curr = parent;
714 }
715 // we hit the top
716 None
717 }
718
719 /// Get a valid right descendant for an offset
720 pub(crate) fn right_descendant(&self, len: Self) -> Option<Self> {
721 let mut node = self.right_child()?;
722 while node >= len {
723 node = node.left_child()?;
724 }
725 Some(node)
726 }
727
728 /// Get the range of nodes this node covers
729 pub const fn node_range(&self) -> Range<Self> {
730 let half_span = self.half_span();
731 let nn = self.0;
732 let r = nn + half_span;
733 let l = nn + 1 - half_span;
734 Self(l)..Self(r)
735 }
736
737 /// Get the range of blocks this node covers
738 pub fn chunk_range(&self) -> Range<ChunkNum> {
739 let level = self.level();
740 let span = 1 << level;
741 let mid = self.0 + 1;
742 // at level 0 (leaf), range will be nn..nn+2
743 // at level >0 (branch), range will be centered on nn+1
744 ChunkNum(mid - span)..ChunkNum(mid + span)
745 }
746
747 /// the number of times you have to go right from the root to get to this node
748 ///
749 /// 0 for a root node
750 pub fn right_count(&self) -> u32 {
751 (self.0 + 1).count_ones() - 1
752 }
753
754 /// Get the post order offset of this node
755 #[inline]
756 pub const fn post_order_offset(&self) -> u64 {
757 // compute number of nodes below me
758 let below_me = self.count_below();
759 // compute next ancestor that is to the left
760 let next_left_ancestor = self.next_left_ancestor0();
761 // compute offset
762 match next_left_ancestor {
763 Some(nla) => below_me + nla + 1 - ((nla + 1).count_ones() as u64),
764 None => below_me,
765 }
766 }
767
768 /// Get the range of post order offsets this node covers
769 pub const fn post_order_range(&self) -> Range<u64> {
770 let offset = self.post_order_offset();
771 let end = offset + 1;
772 let start = offset - self.count_below();
773 start..end
774 }
775
776 /// Get the next left ancestor, or None if we don't have one
777 ///
778 /// this is a separate fn so it can be const.
779 #[inline]
780 const fn next_left_ancestor0(&self) -> Option<u64> {
781 // add 1 to go to the representation where trailing zeroes = level
782 let x = self.0 + 1;
783 // clear the lowest bit
784 let without_lowest_bit = x & (x - 1);
785 // go back to the normal representation,
786 // producing None if without_lowest_bit is 0, which means that there is no next left ancestor
787 without_lowest_bit.checked_sub(1)
788 }
789}
790
791/// Iterative way to find the offset of a node in a pre-order traversal.
792///
793/// I am sure there is a way that does not require a loop, but this will do for now.
794/// It is slower than the direct formula, but it is still in the nanosecond range,
795/// so at a block size of 16 KiB it should not be the limiting factor for anything.
796fn pre_order_offset_loop(node: u64, len: u64) -> u64 {
797 // node level, 0 for leaf nodes
798 let level = (!node).trailing_zeros();
799 // span of the node, 1 for leaf nodes
800 let span = 1u64 << level;
801 // nodes to the left of the tree of this node
802 let left = node + 1 - span;
803 // count the parents with a loop
804 let mut parent_count = 0;
805 let mut offset = node;
806 let mut span = span;
807 // loop until we reach the root, adding valid parents
808 loop {
809 let pspan = span * 2;
810 // find parent
811 offset = if (offset & pspan) == 0 {
812 offset + span
813 } else {
814 offset - span
815 };
816 // if parent is inside the tree, increase parent count
817 if offset < len {
818 parent_count += 1;
819 }
820 if pspan >= len {
821 // we are at the root
822 break;
823 }
824 span = pspan;
825 }
826 left - (left.count_ones() as u64) + parent_count
827}
828
829/// Split a range set into range sets for the left and right half of a node
830///
831/// Requires that the range set is minimal, it should not contain any redundant
832/// boundaries outside of the range of the node. The values outside of the node
833/// range don't matter, so any change outside the range must be omitted.
834///
835/// Produces two range sets that are also minimal. A range set for left or right
836/// that covers the entire range of the node will be replaced with the set of
837/// all chunks, so an is_all() check can be used to check if further recursion
838/// is necessary.
839pub(crate) fn split(
840 ranges: &RangeSetRef<ChunkNum>,
841 node: TreeNode,
842) -> (&RangeSetRef<ChunkNum>, &RangeSetRef<ChunkNum>) {
843 let mid = node.mid();
844 let start = node.chunk_range().start;
845 split_inner(ranges, start, mid)
846}
847
848/// The actual implementation of split. This is used from split and from the
849/// recursive reference implementation.
850pub(crate) fn split_inner(
851 ranges: &RangeSetRef<ChunkNum>,
852 start: ChunkNum,
853 mid: ChunkNum,
854) -> (&RangeSetRef<ChunkNum>, &RangeSetRef<ChunkNum>) {
855 let (mut a, mut b) = ranges.split(mid);
856 // check that a does not contain a redundant boundary at or after mid
857 debug_assert!(a.boundaries().last() < Some(&mid));
858 // Replace a with the canonicalized version if it is a single interval that
859 // starts at or before start. This is necessary to be able to check it with
860 // RangeSetRef::is_all()
861 if a.boundaries().len() == 1 && a.boundaries()[0] <= start {
862 a = RangeSetRef::new(&[ChunkNum(0)]).unwrap();
863 }
864 // Replace b with the canonicalized version if it is a single interval that
865 // starts at or before mid. This is necessary to be able to check it with
866 // RangeSetRef::is_all()
867 if b.boundaries().len() == 1 && b.boundaries()[0] <= mid {
868 b = RangeSetRef::new(&[ChunkNum(0)]).unwrap();
869 }
870 (a, b)
871}
872
873// Module that handles io::Error serialization/deserialization
874#[cfg(feature = "serde")]
875mod io_error_serde {
876 use std::{fmt, io};
877
878 use serde::{
879 de::{self, Visitor},
880 Deserializer, Serializer,
881 };
882
883 pub fn serialize<S>(error: &io::Error, serializer: S) -> Result<S::Ok, S::Error>
884 where
885 S: Serializer,
886 {
887 // Serialize the error kind and message
888 serializer.serialize_str(&format!("{:?}:{}", error.kind(), error))
889 }
890
891 pub fn deserialize<'de, D>(deserializer: D) -> Result<io::Error, D::Error>
892 where
893 D: Deserializer<'de>,
894 {
895 struct IoErrorVisitor;
896
897 impl Visitor<'_> for IoErrorVisitor {
898 type Value = io::Error;
899
900 fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
901 formatter.write_str("an io::Error string representation")
902 }
903
904 fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
905 where
906 E: de::Error,
907 {
908 // For simplicity, create a generic error
909 // In a real app, you might want to parse the kind from the string
910 Ok(io::Error::other(value))
911 }
912 }
913
914 deserializer.deserialize_str(IoErrorVisitor)
915 }
916}