Bf-Tree
Bf-Tree is a modern read-write-optimized concurrent larger-than-memory range index in Rust from MSR.
Design Details
You can find the Bf-Tree research paper here. You can find more design docs here.
User Guide
Rust
Bf-Tree is written in Rust, and is available as a Rust crate. You can add Bf-Tree to your Cargo.toml like this:
Which will add bf_tree as a dependency to your Cargo.toml
[]
= "0.5.3"
An example use of Bf-Tree:
use BfTree;
use LeafReadResult;
let mut config = default;
config.cb_min_record_size;
let tree = with_config.unwrap;
tree.insert;
let mut buffer = ;
let read_size = tree.read;
assert_eq!;
assert_eq!;
Snapshots and Recovery
Bf-Tree supports CPR-style consistent snapshots. A snapshot captures a consistent prefix of all in-flight transactions to a file, and a tree can later be reconstructed from such a snapshot file.
To enable snapshots, set use_snapshot(true) on the Config before creating
the tree. Once enabled, call [BfTree::cpr_snapshot] with the destination path
to take a snapshot at any point. Snapshots can be taken concurrently with
ongoing reads and writes; only one snapshot may be in progress at a time.
To recover a tree from a snapshot file, use
[BfTree::new_from_cpr_snapshot]. The snapshot file embeds most configuration
fields (record sizes, leaf page size, cache-only flag, etc.), so the caller
only needs to specify the recovery-time options:
recovery_snapshot_file_path: path of the snapshot file to recover from.use_snapshot: whether the recovered tree should itself support taking new snapshots.buffer_ptr: optional pointer to a pre-allocated cache buffer; passNoneto let Bf-Tree allocate one.buffer_size: optional override of the cache size stored in the snapshot. If smaller than the original, recovery may fail because cached pages from the snapshot must fit in memory.wal: optional write-ahead log configuration for the recovered tree.
use ;
use PathBuf;
let mut config = default;
config.use_snapshot;
let tree = with_config.unwrap;
tree.insert;
// Take a CPR snapshot of the current tree state.
tree.cpr_snapshot;
// Recover a bf-tree from a CPR snapshot
let tree = new_from_cpr_snapshot.unwrap;
remove_file;
You can check whether all active threads have transitioned to the next snapshot version during a snapshot (i.e., all threads are operating in v + 1) with:
let ready = tree.are_all_threads_in_next_snapshot_version;
This is useful for coordinating external systems that need to wait until a snapshot's data is fully consistent but not the whole snapshot to finish before proceeding. Note that, this function returns false if no ongoing snapshot.
Notes:
- The snapshot file path passed to
cpr_snapshotmust be different from the path used by any concurrent recovery. - For more on the snapshot/recovery design, see doc/snapshot-recovery.md.
PRs are accepted and preferred over feature requests. Feel free to reach out if you have any design questions.
Developer Guide
Building
Prerequisite
Bf-Tree supports Linux, Windows, and macOS, although only a recently version of Linux is rigorously tested. Bf-Tree is written in Rust, which you can install here.
Please install pre-commit hooks to ensure that your code is formatted and linted in the same way as the rest of the project; the coding style will be enforced in CI, these hooks act as a pre-filter.
# If on Ubuntu
&&
Build
Testing
Unit Tests
Shuttle Tests
Concurrent systems are nondeterministic, and subject to exponential amount of different thread interleaving. We use shuttle to deterministically and systematically explore different thread interleaving to uncover the bugs caused by subtle multithread interactions.
# Core Bf-tree concurrent operations (~5 minutes)
# CPR snapshot with disk-backed storage (fast, < 1s)
# CPR snapshot with cache-only (in-memory) storage
To replay a specific failing schedule (generated automatically on failure into target/schedule000.txt):
Fuzz Tests
Fuzz testing is a bug finding technique that generates random inputs to the system and test for crash. Bf-Tree employs fuzzing to generate random operation sequences (e.g., insert, read, scan) to the system and check that none of the operation sequence will crash the system or lead to inconsistent state. Check the fuzz folder for more details.
Benchmarking
Check the benchmark folder for more details.
More advanced benchmarking, with metrics collecting, numa-node binding, huge page, etc:
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Contributing
Please see CONTRIBUTING.md.
Security
See SECURITY.md for security reporting details.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.