Crate bao_tree

source ·
Expand description

§Efficient BLAKE3 based verified streaming

This crate is similar to the bao crate, but takes a slightly different approach.

The core struct is BaoTree, which describes the geometry of the tree and various ways to traverse it. An individual tree node is identified by TreeNode, which is just a newtype wrapper for an u64.

TreeNode provides various helpers to e.g. get the offset of a node in different traversal orders.

There are newtypes for the different kinds of integers used in the tree: ChunkNum is an u64 number of chunks, TreeNode is an u64 tree node identifier, and BlockSize is the log base 2 of the chunk group size.

All this is then used in the io module to implement the actual io, both synchronous and asynchronous.

§Basic usage

The basic workflow is like this: you have some existing data, for which you want to enable verified streaming. This data can either be in memory, in a file, or even a remote resource such as an HTTP server.

§Outboard creation

You create an outboard using the CreateOutboard trait. It has functions to create an outboard from scratch or to initialize data and root hash from existing data.

§Serving requests

You serve streaming requests by using the encode_ranges or encode_ranges_validated functions in the sync or async io module. For this you need data and a matching outboard.

The difference between the two functions is that the validated version will check the hash of each chunk against the bao tree encoded in the outboard, so you will detect data corruption before sending out data to the requester. When using the unvalidated version, you might send out corrupted data without ever noticing and earn a bad reputation.

Due to the speed of the blake3 hash function, validation is not a significant performance overhead compared to network operations and encryption.

The requester will send a set of chunk ranges they are interested in. To compute chunk ranges from byte ranges, there is a helper function round_up_to_chunks that takes a byte range and rounds up to chunk ranges.

If you just want to stream the entire blob, you can use ChunkRanges::all as the range.

§Processing requests

You process requests by using the decode_ranges function in the sync or async io module. This function requires prior knowledge of the tree geometry (total data size and block size). A common way to get this information is to have the block size as a common parameter of both sides, and send the total data size as a prefix of the encoded data. E.g. the original bao crate uses a little endian u64 as the prefix.

This function will perform validation in any case, there is no variant that skips validation since that would defeat the purpose of verified streaming.

§Simple end to end example

use bao_tree::{
    io::{
        outboard::PreOrderOutboard,
        round_up_to_chunks,
        sync::{decode_ranges, encode_ranges_validated, valid_ranges, CreateOutboard},
    },
    BlockSize, ByteRanges, ChunkRanges,
};
use std::io;

/// Use a block size of 16 KiB, a good default for most cases
const BLOCK_SIZE: BlockSize = BlockSize::from_chunk_log(4);

// The file we want to serve
let file = std::fs::File::open("video.mp4")?;
// Create an outboard for the file, using the current size
let ob = PreOrderOutboard::<Vec<u8>>::create(&file, BLOCK_SIZE)?;
// Encode the first 100000 bytes of the file
let ranges = ByteRanges::from(0..100000);
let ranges = round_up_to_chunks(&ranges);
// Stream of data to client. Needs to implement `io::Write`. We just use a vec here.
let mut to_client = vec![];
encode_ranges_validated(&file, &ob, &ranges, &mut to_client)?;

// Stream of data from client. Needs to implement `io::Read`. We just wrap the vec in a cursor.
let from_server = io::Cursor::new(to_client);
let root = ob.root;
let tree = ob.tree;

// Decode the encoded data into a file
let mut decoded = std::fs::File::create("copy.mp4")?;
let mut ob = PreOrderOutboard {
    tree,
    root,
    data: vec![],
};
decode_ranges(from_server, &ranges, &mut decoded, &mut ob)?;

// the first 100000 bytes of the file should now be in `decoded`
// in addition, the required part of the tree to validate that the data is
// correct are in `ob.data`

// Print the valid ranges of the file
for range in valid_ranges(&ob, &decoded, &ChunkRanges::all()) {
    println!("{:?}", range);
}

§Async end to end example

The async version is very similar to the sync version, except that it needs an async context. All functions that do IO are async. The file has to be an [iroh_io::File], which is just a wrapper for std::fs::File that implements async random access via the AsyncSliceReader trait.

We use futures_lite crate, but using the normal futures crate will also work.

use bao_tree::{
    io::{
        outboard::PreOrderOutboard,
        round_up_to_chunks,
        fsm::{decode_ranges, encode_ranges_validated, valid_ranges, CreateOutboard},
    },
    BlockSize, ByteRanges, ChunkRanges,
};
use bytes::BytesMut;
use futures_lite::StreamExt;
use std::io;

/// Use a block size of 16 KiB, a good default for most cases
const BLOCK_SIZE: BlockSize = BlockSize::from_chunk_log(4);

// The file we want to serve
let mut file = iroh_io::File::open("video.mp4".into()).await?;
// Create an outboard for the file, using the current size
let mut ob = PreOrderOutboard::<BytesMut>::create(&mut file, BLOCK_SIZE).await?;
// Encode the first 100000 bytes of the file
let ranges = ByteRanges::from(0..100000);
let ranges = round_up_to_chunks(&ranges);
// Stream of data to client. Needs to implement `io::Write`. We just use a vec here.
let mut to_client = Vec::new();
encode_ranges_validated(file, &mut ob, &ranges, &mut to_client).await?;

// Stream of data from client. Needs to implement `io::Read`. We just wrap the vec in a cursor.
let from_server = io::Cursor::new(to_client.as_slice());
let root = ob.root;
let tree = ob.tree;

// Decode the encoded data into a file
let mut decoded = iroh_io::File::open("copy.mp4".into()).await?;
let mut ob = PreOrderOutboard {
    tree,
    root,
    data: BytesMut::new(),
};
decode_ranges(from_server, ranges, &mut decoded, &mut ob).await?;

// the first 100000 bytes of the file should now be in `decoded`
// in addition, the required part of the tree to validate that the data is
// correct are in `ob.data`

// Print the valid ranges of the file
let ranges = ChunkRanges::all();
let mut stream = valid_ranges(&mut ob, &mut decoded, &ranges);
while let Some(range) = stream.next().await {
    println!("{:?}", range);
}

§Compatibility with the bao crate

This crate will be compatible with the bao crate, provided you do the following:

  • use a block size of 1024, so no chunk groups
  • use a little endian u64 as the prefix for the encoded data
  • use only a single range

Re-exports§

Modules§

  • Implementation of bao streaming for std io and tokio io
  • Iterators over BaoTree nodes

Structs§

Enums§

Type Aliases§