Expand description
redhac
is derived from Rust Embedded Distributed Highly Available Cache
The keywords embedded and distributed are a bit strange at the same time.
The idea of redhac
is to provide a caching library which can be embedded into any Rust
application, while still providing the ability to build a distributed HA caching layer.
It can be used as a cache for a single instance too, of course, but then it will not be the
most performant, since it needs clones of most values to be able to send them over the network
concurrently. If you need a distributed cache however, you might give redhac
a try.
The underlying system works with quorum and will elect a leader dynamically on startup. Each node will start a server and a client part for bi-directional gRPC streaming. Each client will connect to each of the other servers. This is the reason why the performance will degrade, if you scale up replicas without further optimization. You can go higher, but the optimal amount of HA nodes is 3.
§Single Instance
// These 3 are needed for the `cache_get` macro
use redhac::{cache_get, cache_get_from, cache_get_value};
use redhac::{cache_put, CacheConfig, SizedCache};
#[tokio::main]
async fn main() {
let (_, mut cache_config) = CacheConfig::new();
// The cache name is used to reference the cache later on
let cache_name = "my_cache";
// We need to spawn a global handler for each cache instance.
// Communication is done over channels.
cache_config.spawn_cache(cache_name.to_string(), SizedCache::with_size(16), None);
// Cache keys can only be `String`s at the time of writing.
let key = "myKey";
// The value you want to cache must implement `serde::Serialize`.
// The serialization of the values is done with `bincode`.
let value = "myCacheValue".to_string();
// At this point, we need cloned values to make everything work nicely with networked
// connections. If you only ever need a local cache, you might be better off with using the
// `cached` crate directly and use references whenever possible.
cache_put(cache_name.to_string(), key.to_string(), &cache_config, &value)
.await
.unwrap();
let res = cache_get!(
// The type of the value we want to deserialize the value into
String,
// The cache name from above. We can start as many for our application as we like
cache_name.to_string(),
// For retrieving values, the same as above is true - we need real `String`s
key.to_string(),
// All our caches have the necessary information added to the cache config. Here a
// reference is totally fine.
&cache_config,
// This does not really apply to this single instance example. If we would have started
// a HA cache layer we could do remote lookups for a value we cannot find locally, if
// this would be set to `true`
false
)
.await
.unwrap();
assert!(res.is_some());
assert_eq!(res.unwrap(), value);
}
§High Availability
The High Availability (HA) works in a way, that each cache member connects to each other. When
eventually quorum is reached, a leader will be elected, which then is responsible for all cache
modifications to prevent collisions (if you do not decide against it with a direct cache_put
).
Since each node connects to each other, it means that you cannot just scale up the cache layer
infinitely. The ideal number of nodes is 3. You can scale this number up for instance to 5 or 7
if you like, but this has not been tested in greater detail so far.
Write performance will degrade the more nodes you add to the cluster, since you simply need
to wait for more Ack’s from the other members.
Read performance however should stay the same.
Each node will keep a local copy of each value inside the cache (if it has not lost connection
or joined the cluster at some later point), which means in most cases reads do not require any
remote network access.
§Configuration
The way to configure the HA_MODE
is optimized for a Kubernetes deployment but may seem a bit
odd at the same time, if you deploy somewhere else. You can either provide the .env
file and
use it as a config file, or just set these variables for the environment directly. You need
to set the following values:
§HA_MODE
The first one is easy, just set HA_MODE=true
§HA_HOSTS
NOTE:
In a few examples down below, the name for deployments may be rauthy
. The reason is, that this
crate was written originally to complement another project of mine, Rauthy (link will follow),
which is an OIDC Provider and Single Sign-On Solution written in Rust.
The HA_HOSTS
is working in a way, that it is really easy inside Kubernetes to configure it,
as long as a StatefulSet
is used for the deployment.
The way a cache node finds its members is by the HA_HOSTS
and its own HOSTNAME
.
In the HA_HOSTS
, add every cache member. For instance, if you want to use 3 replicas in HA
mode which are running and deployed as a StatefulSet
with the name rauthy
again:
HA_HOSTS="http://rauthy-0:8000, http://rauthy-1:8000 ,http://rauthy-2:8000"
The way it works:
- A node gets its own hostname from the OS
This is the reason, why you use a StatefulSet for the deployment, even without any volumes attached. For aStatefulSet
calledrauthy
, the replicas will always have the namesrauthy-0
,rauthy-1
, …, which are at the same time the hostnames inside the pod. - Find “me” inside the
HA_HOSTS
variable
If the hostname cannot be found in theHA_HOSTS
, the application will panic and exit because of a misconfiguration. - Use the port from the “me”-Entry that was found for the server part
This means you do not need to specify the port in another variable which eliminates the risk of having inconsistencies or a bad config in that case. - Extract “me” from the
HA_HOSTS
then take the leftover nodes as all cache members and connect to them - Once a quorum has been reached, a leader will be elected
From that point on, the cache will start accepting requests - If the leader is lost - elect a new one - No values will be lost
- If quorum is lost, the cache will be invalidated
This happens for security reasons to provide cache inconsistencies. Better invalidate the cache and fetch the values fresh from the DB or other cache members than working with possibly invalid values, which is especially true in an authn / authz situation.
NOTE:
If you are in an environment where the described mechanism with extracting the hostname would
not work, you can set the HOSTNAME_OVERWRITE
for each instance to match one of the HA_HOSTS
entries, or you can overwrite the name when using the redhac::start_cluster
.
§CACHE_AUTH_TOKEN
You need to set a secret for the CACHE_AUTH_TOKEN
, which is then used for authenticating
cache members.
§TLS
For the sake of this example, we will not dig into TLS and disable it in the example, which
can be done with CACHE_TLS=false
.
You can add your TLS certificates in PEM format and an optional Root CA. This is true for the Server and the Client part separately. This means you can configure the cache layer to use mTLS connections.
§Reference Config
The following variables are the ones you can use to configure redhac
via env vars.
At the time of writing, the configuration can only be done via the env.
# If the cache should start in HA mode or standalone
# accepts 'true|false', defaults to 'false'
HA_MODE=true
# The connection strings (with hostnames) of the HA instances as a CSV
# Format: 'scheme://hostname:port'
HA_HOSTS="http://redhac.redhac:8080, http://redhac.redhac:8180 ,http://redhac.redhac:8280"
# This can overwrite the hostname which is used to identify each cache member.
# Useful in scenarios, where all members are on the same host or for testing.
# You need to add the port, since `redhac` will do an exact match to find "me".
#HOSTNAME_OVERWRITE="127.0.0.1:8080"
# Enable / disable TLS for the cache communication (default: true)
CACHE_TLS=true
# The path to the server TLS certificate PEM file (default: tls/redhac.cert-chain.pem)
CACHE_TLS_SERVER_CERT=tls/redhac.cert-chain.pem
# The path to the server TLS key PEM file (default: tls/redhac.key.pem)
CACHE_TLS_SERVER_KEY=tls/redhac.key.pem
# The path to the client mTLS certificate PEM file. This is optional.
CACHE_TLS_CLIENT_CERT=tls/redhac.local.cert.pem
# The path to the client mTLS key PEM file. This is optional.
CACHE_TLS_CLIENT_KEY=tls/redhac.local.key.pem
# If not empty, the PEM file from the specified location will be added as the CA certificate chain for validating
# the servers TLS certificate. This is optional.
CACHE_TLS_CA_SERVER=tls/ca-chain.cert.pem
# If not empty, the PEM file from the specified location will be added as the CA certificate chain for validating
# the clients mTLS certificate. This is optional.
CACHE_TLS_CA_CLIENT=tls/ca-chain.cert.pem
# The domain / CN the client should validate the certificate against. This domain MUST be inside the
# 'X509v3 Subject Alternative Name' when you take a look at the servers certificate with the openssl tool.
# default: redhac.local
CACHE_TLS_CLIENT_VALIDATE_DOMAIN=redhac.local
# Can be used if you need to overwrite the SNI when the client connects to the server, for
# instance if you are behind a loadbalancer which combines multiple certificates. (default: "")
#CACHE_TLS_SNI_OVERWRITE=
# Define different buffer sizes for channels between the components
# Buffer for client request on the incoming stream - server side (default: 128)
# Makes sense to have the CACHE_BUF_SERVER roughly set to:
# `(number of total HA cache hosts - 1) * CACHE_BUF_CLIENT`
CACHE_BUF_SERVER=128
# Buffer for client requests to remote servers for all cache operations (default: 64)
CACHE_BUF_CLIENT=64
# Secret token, which is used to authenticate the cache members
CACHE_AUTH_TOKEN=SuperSafeSecretToken1337
# Connections Timeouts
# The Server sends out keepalive pings with configured timeouts
# The keepalive ping interval in seconds (default: 5)
CACHE_KEEPALIVE_INTERVAL=5
# The keepalive ping timeout in seconds (default: 5)
CACHE_KEEPALIVE_TIMEOUT=5
# The timeout for the leader election. If a newly saved leader request has not reached quorum
# after the timeout, the leader will be reset and a new request will be sent out.
# CAUTION: This should not be below CACHE_RECONNECT_TIMEOUT_UPPER, since cold starts and
# elections will be problematic in that case.
# value in seconds, default: 2
CACHE_ELECTION_TIMEOUT=2
# These 2 values define the reconnect timeout for the HA Cache Clients.
# The values are in ms and a random between these 2 will be chosen each time to avoid conflicts
# and race conditions (default: 500)
CACHE_RECONNECT_TIMEOUT_LOWER=500
# (default: 2000)
CACHE_RECONNECT_TIMEOUT_UPPER=2000
§Example
use std::env;
use redhac::*;
use redhac::quorum::{AckLevel, QuorumState};
use std::time::Duration;
use tokio::time;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// `redhac` is configured via env variables.
// For the sake of this example, we set the env vars directly inside the code. Usually you
// would want to configure them on the outside of course.
// Enable the HA_MODE
env::set_var("HA_MODE", "true");
// Configure the HA Cache members. You need an uneven number for quorum.
// For this example, we will have all of them on the same host on different ports to make
// it work.
env::set_var(
"HA_HOSTS",
"http://127.0.0.1:7001, http://127.0.0.1:7002, http://127.0.0.1:7003",
);
// Disable TLS for this example
env::set_var("CACHE_TLS", "false");
// Configure a cache
let (tx_health_1, mut cache_config_1) = CacheConfig::new();
let cache_name = "my_cache";
let cache = SizedCache::with_size(16);
cache_config_1.spawn_cache(cache_name.to_string(), cache.clone(), None);
// start server
start_cluster(
tx_health_1,
&mut cache_config_1,
// optional notification channel: `Option<mpsc::Sender<CacheNotify>>`
None,
// We need to overwrite the hostname, so we can start all nodes on the same host for this
// example. Usually, this will be set to `None`
Some("127.0.0.1:7001".to_string()),
)
.await?;
time::sleep(Duration::from_millis(100)).await;
println!("First cache node started");
// Mimic the other 2 cache members. This should usually not be done in the same code - only
// for this example to make it work.
let (tx_health_2, mut cache_config_2) = CacheConfig::new();
cache_config_2.spawn_cache(cache_name.to_string(), cache.clone(), None);
start_cluster(
tx_health_2,
&mut cache_config_2,
None,
Some("127.0.0.1:7002".to_string()),
)
.await?;
time::sleep(Duration::from_millis(100)).await;
println!("2nd cache node started");
// Now after the 2nd cache member has been started, we would already have quorum and a
// working cache layer. As long as there is no leader and / or quorum, the cache will not
// save any values to avoid inconsistencies.
let (tx_health_3, mut cache_config_3) = CacheConfig::new();
cache_config_3.spawn_cache(cache_name.to_string(), cache.clone(), None);
start_cluster(
tx_health_3,
&mut cache_config_3,
None,
Some("127.0.0.1:7003".to_string()),
)
.await?;
time::sleep(Duration::from_millis(100)).await;
println!("3rd cache node started");
// For the sake of this example again, we need to wait until the cache is in a healthy
// state, before we can actually insert a value
loop {
let health_borrow = cache_config_1.rx_health_state.borrow();
let health = health_borrow.as_ref().unwrap();
if health.state == QuorumState::Leader || health.state == QuorumState::Follower {
break;
}
time::sleep(Duration::from_secs(1)).await;
}
println!("Cache 1 State: {:?}", cache_config_1.rx_health_state.borrow());
println!("Cache 2 State: {:?}", cache_config_2.rx_health_state.borrow());
println!("Cache 3 State: {:?}", cache_config_3.rx_health_state.borrow());
println!("We are ready to go - insert and get a value");
let entry = "myKey".to_string();
// `cache_insert` is the HA version of `cache_put` from the single instance example.
//
// `cache_put` does a direct push in HA situations, ignoring the leader. This is way more
// performant, but may lead to collisions. However, if you are inserting a completely new
// value with a unique new key, for instance a newly generated UUID, it should be favored,
// since the same UUID will not be inserted from someone else most likely.
//
// `cache_insert` can be used if you want to avoid conflicts. With the `AckLevel`, you can
// specify the level of safety you want. For instance, if we use `AckLevel::Quorum`, the
// Result will not be `Ok(())` until the leader has at least the ack for the insert from at
// least "quorum" amount of nodes.
// Different levels provide different levels of safety and performance - it is always a
// tradeoff.
cache_insert(
cache_name.to_string(),
entry.clone(),
&cache_config_1,
// This is a reference to the value, which must implement `serde::Serialize`
&1337i32,
// The `AckLevel` defines the level of safety we want
AckLevel::Quorum
)
.await?;
// check the value
let one = cache_get!(
i32,
cache_name.to_string(),
entry,
&cache_config_1,
// Nn a HA situation, we could set this to `true` if we wanted to do a lookup on another
// cache member in case we do not find the value locally, for instance if a node joined
// at a later time or when there was a network partition.
false
)
.await?;
assert_eq!(one, Some(1337));
Ok(())
}
Re-exports§
pub use crate::quorum::AckLevel;
pub use crate::quorum::QuorumHealth;
pub use crate::quorum::QuorumHealthState;
pub use crate::quorum::QuorumState;
Modules§
Macros§
- This is a simple macro to get values from the cache and deserialize them properly at the same time.
Structs§
- The CacheConfig needs to be initialized at the very beginning of the application and must be given to the start_cluster function, when running with
HA_MODE
== true. - The general CacheError which is returned by most of the cache wrapper functions.
- The
CacheNotify
will be sent over the optionaltx_notify
for the server, so clients are able to subscribe to updates inside their cache. - Least Recently Used /
Sized
Cache - Cache store bound by time
- Timed LRU Cache
Enums§
- These
CacheMethod
s should be self explainatory. However,Insert
andRemove
are only executed over the leader and routed internally, whereasPut
andDel
are direct methods, which ignore the leader and trade conflict safety for faster execution speed. - The enum which will be sent to the “real” cache handler loop in the end.
Functions§
- Deletes a value from the cache. Will return immediately with
Ok(())
if quorum is Bad inHA_MODE
. Does ignore the quorum health state, if the cache is running in HA mode. - This is the deserializer function for cached values. If you do not need deserialization, you can skip using the
cache_get!
macro and only use cache_get_value to not deserialize. - Gets values out of the cache.
- Put values into the cache.
- Clears all cache entries of each existing cache. This full reset is local and will not be pushed to possible remote HA cache members.
- The main function to start the whole backend when
HA_MODE
== true. It cares about starting the server and one client for each remote server configured inHA_HOSTS
, as well as the internal ‘quorum_handler’.