Expand description
🪵🪵🪵 Raft is not yet good enough. This project intends to improve raft as the next-generation consensus protocol for distributed data storage systems (SQL, NoSQL, KV, Streaming, Graph … or maybe something more exotic).
Currently, openraft is the consensus engine of meta-service cluster in databend.
-
Get started: The guide is the best place to get started, followed by the docs for more in-depth details.
-
Openraft FAQ explains some common questions.
-
🙌 Questions? Join the Discord channel or start a discussion.
-
Openraft is derived from async-raft with several bugs fixed: Fixed bugs.
Status
-
Openraft API is not stable yet. Before
1.0.0
, an upgrade may contain incompatible changes. Check our change-log. A commit message starts with a keyword to indicate the modification type of the commit:DataChange:
on-disk data types changes, which may require manual upgrade.Change:
if it introduces incompatible changes.Feature:
if it introduces compatible non-breaking new features.Fix:
if it just fixes a bug.
-
Branch main has been under active development.
The main branch is for the 0.8 release.
- The features are almost complete for building an application.
- The performance isn’t yet fully optimized. Currently, it’s about 48,000 writes per second with a single writer.
- Unit test coverage is 91%.
- The chaos test is not yet done.
-
Branch release-0.8: Latest published: v0.8.3 | Change log v0.8.3 | ⬆️ 0.7 to 0.8 upgrade guide |
-
Branch release-0.7: Latest published: v0.7.6 | Change log v0.7.6 | ⬆️ 0.6 to 0.7 upgrade guide |
release-0.7
Won’t accept new features but only bug fixes. -
Branch release-0.6: Latest published: v0.6.8 | Change log v0.6 |
release-0.6
won’t accept new features but only bug fixes.
Roadmap
-
2022-10-31 Extended joint membership
-
2023-02-14 Minimize confliction rate when electing; See: Openraft Vote design; Or use standard raft mode with feature flag
single-term-leader
. -
2023-04-26 Goal performance is 1,000,000 put/sec.
-
Reduce the complexity of vote and pre-vote: get rid of pre-vote RPC;
-
Support flexible quorum, e.g.:Hierarchical Quorums
-
Consider introducing read-quorum and write-quorum, improve efficiency with a cluster with an even number of nodes.
Performance
The benchmark is focused on the Openraft framework itself and is run on a minimized store and network. This is NOT a real world application benchmark!!!
Benchmark history:
Date | clients | put/s | ns/op | Changes |
---|---|---|---|---|
2023-04-26 | 256 | 1,014,000 | 985 | |
2023-04-25 | 64 | 730,000 | 1,369 | Split channels |
2023-04-24 | 64 | 652,000 | 1,532 | Reduce metrics report rate |
2023-04-23 | 64 | 467,000 | 2,139 | State-machine moved to separate task |
1 | 70,000 | 14,273 | ||
2023-02-28 | 1 | 48,000 | 20,558 | |
2022-07-09 | 1 | 45,000 | 21,784 | Batch purge applied log |
2022-07-07 | 1 | 43,000 | 23,218 | Use Progress to track replication |
2022-07-01 | 1 | 41,000 | 23,255 |
To access the benchmark, go to the ./cluster_benchmark
folder and run make bench_cluster_of_3
.
The benchmark is carried out with varying numbers of clients because:
- The
1 client
benchmark shows the average latency to commit each log. - The
64 client
benchmark shows the maximum throughput.
The benchmark is conducted with the following settings:
- No network.
- In-memory store.
- A cluster of 3 nodes in a single process on a Mac M1-Max laptop.
- Request: empty
- Response: empty
Features
-
It is fully reactive and embraces the async ecosystem. It is driven by actual Raft events taking place in the system as opposed to being driven by a
tick
operation. Batching of messages during replication is still used whenever possible for maximum throughput. -
Storage and network integration is well defined via two traits
RaftStorage
&RaftNetwork
. This provides applications maximum flexibility in being able to choose their storage and networking mediums. -
All interaction with the Raft node is well defined via a single public
Raft
type, which is used to spawn the Raft async task, and to interact with that task. The API for this system is clear and concise. -
Log replication is fully pipelined and batched for optimal performance. Log replication also uses a congestion control mechanism to help keep nodes up-to-date as efficiently as possible.
-
It fully supports dynamic cluster membership changes with joint config. The buggy single-step membership change algo is not considered. See the
dynamic membership
chapter in the guide. -
Details on initial cluster formation, and how to effectively do so from an application’s perspective, are discussed in the cluster formation chapter in the guide.
-
Automatic log compaction with snapshots, as well as snapshot streaming from the leader node to follower nodes is fully supported and configurable.
-
The entire code base is instrumented with tracing. This can be used for standard logging, or for distributed tracing, and the verbosity can be statically configured at compile time to completely remove all instrumentation below the configured level.
Who use it
Contributing
Check out the CONTRIBUTING.md guide for more details on getting started with contributing to this project.
License
Openraft is licensed under the terms of the MIT License or the Apache License 2.0, at your choosing.
Feature flags
-
bench
: Enables benchmarks in unittest. Benchmark in openraft depends on the unstable featuretest
thus it can not be used with stable rust. In order to run the benchmark with stable toolchain, the unstable features have to be enabled explicitly with environment variableRUSTC_BOOTSTRAP=1
. -
bt
: Enable backtrace: generate backtrace for errors. This requires unstable featureerror_generic_member_access
thus it can not be used with stable rust. -
serde
: Add serde::Serialize and serde:Deserialize bound to data types. If you’d like to useserde
to serialize messages.
Modules
- This mod is a upgrade helper that provides functionalities for a newer openraft application to read data written by an older application.
- Error types exposed by this crate.
- This mod defines the identity of a raft log and provides supporting utilities to work with log id related types.
- Raft metrics for observability.
- The Raft network interface.
- Public Raft interface and data types.
- The Raft storage interface and data types.
- Openraft Document
Re-exports
pub use network::RPCTypes;
pub use network::RaftNetwork;
pub use network::RaftNetworkFactory;
Enums
- Defines various actions to change the membership, including adding or removing learners or voters.
Structs
- The runtime configuration for a Raft node.
Enums
- Error variants related to configuration.
- Log compaction and snapshot policy.
- All possible states of a Raft node.
Re-exports
pub use crate::entry::Entry;
pub use crate::entry::EntryPayload;
pub use crate::log_id::LogId;
pub use crate::log_id::LogIdOptionExt;
pub use crate::log_id::LogIndexOptionExt;
pub use crate::log_id::RaftLogId;
Structs
- The currently active membership config.
- The membership configuration of the cluster.
- This struct represents information about a membership config that has already been stored in the raft logs.
Re-exports
pub use crate::metrics::RaftMetrics;
Structs
- An implementation of trait
Node
that contains minimal node information. - EmptyNode is an implementation of trait
Node
that contains nothing.
Traits
- A Raft
Node
, this trait holds all relevant node information. - A Raft node’s ID.
Re-exports
pub use crate::raft::Raft;
pub use crate::raft::RaftTypeConfig;
Structs
- The state of membership configs a raft node needs to know.
- A struct used to represent the raft state which a Raft node needs.
Type Aliases
- Id of a snapshot stream.
Structs
- The identity of a segment of a snapshot.
Re-exports
pub use crate::storage::LogState;
pub use crate::storage::RaftLogReader;
pub use crate::storage::RaftSnapshotBuilder;
pub use crate::storage::RaftStorage;
pub use crate::storage::Snapshot;
pub use crate::storage::SnapshotMeta;
pub use crate::storage::StorageHelper;
Structs
- An error that occurs when the RaftStore impl runs defensive check of input or output. E.g. re-applying an log entry is a violation that may be a potential bug.
Enums
- What it is doing when an error occurs.
- A storage error could be either a defensive check error or an error occurred when doing the actual io operation.
Structs
- Error that occurs when operating the store.
Traits
- Convert error to StorageError::IO();
Enums
- Violations a store would return when running defensive check.
Traits
- Similar to
AsRef<T>
, it does a cheap reference to reference conversion, but with the ability to returnNone
if it is unable to perform the conversion to&T
.
Structs
Vote
represent the privilege of a node.
Traits
- A trait defining application specific data.
- A trait defining application specific response data.
Macros
- Define types for a Raft type configuration.
Re-exports
Structs
- AnyError is a serializable wrapper
Error
.
Re-exports
pub use async_trait;