Struct HealthCheckConfig

Source

pub struct HealthCheckConfig {
    pub enabled: bool,
    pub server_bind_address: Option<SocketAddr>,
    pub cache_ttl: Duration,
    pub min_connections: Option<usize>,
    pub max_memory_mb: Option<usize>,
    pub max_time_drift_ms: Option<i64>,
    pub max_pending_events: Option<usize>,
}

Expand description

Comprehensive health check configuration for NodeConfig.

This configuration enables and configures the health check system for a node. When set in NodeConfig, the node will automatically initialize health checks and optionally start an HTTP server to expose health endpoints.

§Health Check System

The health check system provides:

Built-in checks for connections, memory, time drift, and state convergence
Configurable thresholds for each check
HTTP endpoints for Kubernetes probes and load balancers
Result caching to minimize overhead

§HTTP Endpoints

When server_bind_address is set, the following endpoints are exposed:

GET /health - Overall health status (200 OK if healthy/degraded, 503 if unhealthy)
GET /ready - Readiness probe (200 OK if healthy/degraded, 503 if unhealthy)
GET /live - Liveness probe (200 OK if healthy/degraded, 503 if unhealthy)

§Example

use elara_runtime::health::HealthCheckConfig;
use elara_runtime::node::NodeConfig;
use std::time::Duration;

let health_config = HealthCheckConfig {
    enabled: true,
    server_bind_address: Some("0.0.0.0:8080".parse().unwrap()),
    cache_ttl: Duration::from_secs(30),
    min_connections: Some(3),
    max_memory_mb: Some(1800),
    max_time_drift_ms: Some(100),
    max_pending_events: Some(1000),
};

let node_config = NodeConfig {
    health_checks: Some(health_config),
    ..Default::default()
};

§Production Recommendations

§Small Deployment (10 nodes)

use elara_runtime::health::HealthCheckConfig;
use std::time::Duration;

let config = HealthCheckConfig {
    enabled: true,
    server_bind_address: Some("0.0.0.0:8080".parse().unwrap()),
    cache_ttl: Duration::from_secs(30),
    min_connections: Some(2),
    max_memory_mb: Some(1000),
    max_time_drift_ms: Some(100),
    max_pending_events: Some(500),
};

§Medium Deployment (100 nodes)

use elara_runtime::health::HealthCheckConfig;
use std::time::Duration;

let config = HealthCheckConfig {
    enabled: true,
    server_bind_address: Some("0.0.0.0:8080".parse().unwrap()),
    cache_ttl: Duration::from_secs(30),
    min_connections: Some(5),
    max_memory_mb: Some(2000),
    max_time_drift_ms: Some(100),
    max_pending_events: Some(1000),
};

§Large Deployment (1000 nodes)

use elara_runtime::health::HealthCheckConfig;
use std::time::Duration;

let config = HealthCheckConfig {
    enabled: true,
    server_bind_address: Some("0.0.0.0:8080".parse().unwrap()),
    cache_ttl: Duration::from_secs(30),
    min_connections: Some(10),
    max_memory_mb: Some(4000),
    max_time_drift_ms: Some(100),
    max_pending_events: Some(2000),
};

Fields§

§enabled: bool

Enable or disable health checks.

When false, no health checks are performed and no HTTP server is started. This allows health checks to be completely disabled in environments where they are not needed.

Default: true

§server_bind_address: Option<SocketAddr>

Optional bind address for the health check HTTP server.

When Some, an HTTP server is started on this address to expose health check endpoints (/health, /ready, /live). When None, health checks are still performed but no HTTP server is started (useful for programmatic health checking without exposing endpoints).

Format: "host:port" (e.g., "0.0.0.0:8080", "127.0.0.1:8080")

Default: Some("0.0.0.0:8080")

§cache_ttl: Duration

Cache TTL for health check results.

Health check results are cached for this duration to avoid excessive checking overhead. Subsequent health check requests within the TTL return cached results.

Recommended values:

High-frequency checks: 10-15 seconds
Normal checks: 30 seconds
Low-frequency checks: 60 seconds

Default: 30 seconds

§min_connections: Option<usize>

Minimum number of active connections for ConnectionHealthCheck.

When Some, a ConnectionHealthCheck is registered that monitors the number of active connections. The check returns Degraded if the connection count falls below this threshold.

When None, no connection health check is performed.

Recommended values:

Small deployment: 2-3
Medium deployment: 5-10
Large deployment: 10-20

Default: Some(3)

§max_memory_mb: Option<usize>

Maximum memory usage in megabytes for MemoryHealthCheck.

When Some, a MemoryHealthCheck is registered that monitors process memory usage. The check returns Unhealthy if memory usage exceeds this threshold.

When None, no memory health check is performed.

Recommended values:

Small deployment: 1000 MB (1 GB)
Medium deployment: 2000 MB (2 GB)
Large deployment: 4000 MB (4 GB)

Set this to 80-90% of your container memory limit to allow for graceful degradation before OOM kills.

Default: Some(1800) (1.8 GB)

§max_time_drift_ms: Option<i64>

Maximum time drift in milliseconds for TimeDriftCheck.

When Some, a TimeDriftCheck is registered that monitors time drift between the local node and network consensus time. The check returns Degraded if drift exceeds this threshold.

When None, no time drift check is performed.

Recommended value: 100 ms

Excessive time drift can cause synchronization issues and state divergence in distributed systems.

Default: Some(100)

§max_pending_events: Option<usize>

Maximum pending events for StateDivergenceCheck.

When Some, a StateDivergenceCheck is registered that monitors the state reconciliation engine. The check returns Degraded if the number of pending events exceeds this threshold.

When None, no state divergence check is performed.

Recommended values:

Small deployment: 500
Medium deployment: 1000
Large deployment: 2000

High pending event counts may indicate network partitions or reconciliation issues.

Default: Some(1000)

Struct HealthCheckConfig Copy item path

§Health Check System

§HTTP Endpoints

§Example

§Production Recommendations

§Small Deployment (10 nodes)

§Medium Deployment (100 nodes)

§Large Deployment (1000 nodes)

Fields§

Implementations§

impl HealthCheckConfig

pub fn disabled() -> Self

§Example

pub fn small_deployment() -> Self

pub fn medium_deployment() -> Self

pub fn large_deployment() -> Self

pub fn validate(&self) -> Result<(), String>

§Validation Rules

Trait Implementations§

impl Clone for HealthCheckConfig

fn clone(&self) -> HealthCheckConfig

fn clone_from(&mut self, source: &Self)

impl Debug for HealthCheckConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for HealthCheckConfig

fn default() -> Self

Auto Trait Implementations§

impl Freeze for HealthCheckConfig

impl RefUnwindSafe for HealthCheckConfig

impl Send for HealthCheckConfig

impl Sync for HealthCheckConfig

impl Unpin for HealthCheckConfig

impl UnsafeUnpin for HealthCheckConfig

impl UnwindSafe for HealthCheckConfig

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> FromRef<T> for Twhere T: Clone,

fn from_ref(input: &T) -> T

impl<T> FromRef<T> for Twhere T: Clone,

fn from_ref(input: &T) -> T

impl<T> FutureExt for T

fn with_context(self, otel_cx: Context) -> WithContext<Self>

fn with_current_context(self) -> WithContext<Self>

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> IntoRequest<T> for T

fn into_request(self) -> Request<T>

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

Struct HealthCheckConfig

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> FromRef<T> for T
where T: Clone,

impl<T> FromRef<T> for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<A, B, T> HttpServerConnExec<A, B> for T
where B: Body,