Expand description
🪵🪵🪵 Raft is not yet good enough. This project intends to improve raft as the next-generation consensus protocol for distributed data storage systems (SQL, NoSQL, KV, Streaming, Graph … or maybe something more exotic).
Currently, openraft is the consensus engine of meta-service cluster in databend.
- 
Get started: The guide is the best place to get started, followed by the docs for more in-depth details. 
- 
Openraft FAQ explains some common questions. 
- 
🙌 Questions? Join the Discord channel or start a discussion. 
- 
Openraft is derived from async-raft with several bugs fixed: Fixed bugs. 
Status
- 
Openraft API is not stable yet. Before 1.0.0, an upgrade may contain incompatible changes. Check our change-log. A commit message starts with a keyword to indicate the modification type of the commit:- Change:if it introduces incompatible changes.
- Feature:if it introduces compatible non-breaking new features.
- Fix:if it just fixes a bug.
 
- 
Branch main has been under active development. The main branch is for the 0.8 release. - The features are almost complete for building an application.
- The performance isn’t yet fully optimized. Currently, it’s about 48,000 writes per second with a single writer.
- Unit test coverage is 91%.
- The chaos test is not yet done.
 
- 
Branch release-0.8: Latest published: v0.8.0 | Change log v0.8.0 | ⬆️ 0.7 to 0.8 upgrade guide | 
- 
Branch release-0.7: Latest published: v0.7.4 | Change log v0.7.4 | ⬆️ 0.6 to 0.7 upgrade guide | release-0.7Won’t accept new features but only bug fixes.
- 
Branch release-0.6: Latest published: v0.6.8 | Change log v0.6 | release-0.6won’t accept new features but only bug fixes.
Roadmap
- 
2022-10-31 Extended joint membership 
- 
2023-02-14 Minimize confliction rate when electing; See: Openraft Vote design; Or use standard raft mode with feature flag single-term-leader.
- 
Reduce the complexity of vote and pre-vote: get rid of pre-vote RPC; 
- 
Support flexible quorum, e.g.:Hierarchical Quorums 
- 
Consider introducing read-quorum and write-quorum, improve efficiency with a cluster with an even number of nodes. 
- 
Goal performance is 1,000,000 put/sec. Bench history: - 2022 Jul 01: 41,000 put/sec; 23,255 ns/op;
- 2022 Jul 07: 43,000 put/sec; 23,218 ns/op; Use Progressto track replication.
- 2022 Jul 09: 45,000 put/sec; 21,784 ns/op; Batch purge applied log
- 2023 Feb 28: 48,000 put/sec; 20,558 ns/op;
 Run the benchmark: make bench_cluster_of_3Benchmark setting: - No network.
- In memory store.
- A cluster of 3 nodes on one server.
- Single client.
 
Features
- 
It is fully reactive and embraces the async ecosystem. It is driven by actual Raft events taking place in the system as opposed to being driven by a tickoperation. Batching of messages during replication is still used whenever possible for maximum throughput.
- 
Storage and network integration is well defined via two traits RaftStorage&RaftNetwork. This provides applications maximum flexibility in being able to choose their storage and networking mediums.
- 
All interaction with the Raft node is well defined via a single public Rafttype, which is used to spawn the Raft async task, and to interact with that task. The API for this system is clear and concise.
- 
Log replication is fully pipelined and batched for optimal performance. Log replication also uses a congestion control mechanism to help keep nodes up-to-date as efficiently as possible. 
- 
It fully supports dynamic cluster membership changes with joint config. The buggy single-step membership change algo is not considered. See the dynamic membershipchapter in the guide.
- 
Details on initial cluster formation, and how to effectively do so from an application’s perspective, are discussed in the cluster formation chapter in the guide. 
- 
Automatic log compaction with snapshots, as well as snapshot streaming from the leader node to follower nodes is fully supported and configurable. 
- 
The entire code base is instrumented with tracing. This can be used for standard logging, or for distributed tracing, and the verbosity can be statically configured at compile time to completely remove all instrumentation below the configured level. 
Who use it
Contributing
Check out the CONTRIBUTING.md guide for more details on getting started with contributing to this project.
License
Openraft is licensed under the terms of the MIT License or the Apache License 2.0, at your choosing.
Feature flags
- 
bench: Enables benchmarks in unittest. Benchmark in openraft depends on the unstable featuretestthus it can not be used with stable rust. In order to run the benchmark with stable toolchain, the unstable features have to be enabled explicitly with environment variableRUSTC_BOOTSTRAP=1.
- 
bt: Enable backtrace: generate backtrace for errors. This requires unstable featureerror_generic_member_accessandprovide_anythus it can not be used with stable rust.
- 
serde: Add serde::Serialize and serde:Deserialize bound to data types. If you’d like to useserdeto serialize messages.
Re-exports
- pub use anyerror;
- pub use async_trait;
- pub use crate::network::RPCTypes;
- pub use crate::network::RaftNetwork;
- pub use crate::network::RaftNetworkFactory;
- pub use crate::raft::Raft;
- pub use crate::raft::RaftTypeConfig;
- pub use crate::storage::LogState;
- pub use crate::storage::RaftLogReader;
- pub use crate::storage::RaftSnapshotBuilder;
- pub use crate::storage::RaftStorage;
- pub use crate::storage::RaftStorageDebug;
- pub use crate::storage::Snapshot;
- pub use crate::storage::SnapshotMeta;
Modules
- Error types exposed by this crate.
- Raft metrics for observability.
- The Raft network interface.
- Public Raft interface and data types.
- The Raft storage interface and data types.
Macros
- Define types for a Raft type configuration.
Structs
- AnyError is a serializable wrapperError.
- Minimal node information.
- The runtime configuration for a Raft node.
- An error that occurs when the RaftStore impl runs defensive check of input or output. E.g. re-applying an log entry is a violation that may be a potential bug.
- The currently active membership config.
- Empty Node.
- A Raft log entry.
- LeaderId is identifier of aleader.
- The identity of a raft log. A term, node_id and an index identifies an log globally.
- The membership configuration of the cluster.
- The state of membership configs a raft node needs to know.
- A set of metrics describing the current state of a Raft node.
- A struct used to represent the raft state which a Raft node needs.
- The identity of a segment of a snapshot.
- StorageHelper provides additional methods to access a RaftStorage implementation.
- Error that occurs when operating the store.
- Extended store backed by another impl.
- This struct represents information about a membership config that has already been stored in the raft logs.
- Voterepresent the privilege of a node.
Enums
- Error variants related to configuration.
- Log entry payload variants.
- What it is doing when an error occurs.
- All possible states of a Raft node.
- Log compaction and snapshot policy.
- A storage error could be either a defensive check error or an error occurred when doing the actual io operation.
- Violations a store would return when running defensive check.
Traits
- Defines methods of defensive checks for RaftStorage.
- Defines methods of defensive checks for RaftStorage independent of the storage type.
- Defines operations on an entry payload.
- Convert error to StorageError::IO();
- A wrapper extends the APIs of a base RaftStore.
Type Definitions
- The unique identifier of a leader that is already granted by a quorum in phase-1(voting).