Crate es_disk_planner

Crate es_disk_planner 

Source
Expand description

§Elasticsearch Disk Capacity Planner

This tool estimates the disk capacity requirements for an Elasticsearch cluster, taking into account:

  • Primary and replica shards
  • Lucene merge overhead
  • Headroom for operational safety (disk watermarks and ingestion bursts)
  • Relocation buffer per node
  • Target maximum disk utilization

The output helps plan realistic disk sizes per node and total cluster capacity.


§🔢 Calculation Model

base = primaries * shard_size_gb * (1 + replicas)
with_merge = base * (1 + overhead_merge)
with_headroom = with_merge * (1 + headroom)
buffer_total = buffer_per_node_gb * nodes
total_cluster = with_headroom + buffer_total
per_node = total_cluster / nodes
disk_per_node = per_node / target_utilization

§Parameters

ParameterDefaultDescription
--shard_size_gb50Average size of a single shard on disk (compressed Lucene data)
--overhead_merge0.20 (20%)Temporary space required by Lucene segment merges
--headroom0.30 (30%)Safety margin to stay below disk watermarks (85–90%)
--buffer_per_node_gb= shard_size_gbSpace reserved per node for shard relocation/rebalancing
--target_utilization0.75 (75%)Maximum desired disk usage ratio

§📊 Example Output

=== Elasticsearch Disk Capacity Planner ===
Nodes: 5
Primary shards: 10
Replicas per shard: 1
Shard size: 50.0 GB | Overhead merge: 20% | Headroom: 30%
Relocation buffer per node: 50.0 GB
Target disk utilization: 75%

Base (primaries+replicas): 1000.0 GB (1.00 TB)
+ Overhead merge:        1200.0 GB (1.20 TB)
+ Headroom:              1560.0 GB (1.56 TB)
+ Total buffer:          250.0 GB  (0.25 TB)
= Cluster total:         1810.0 GB (1.81 TB)

Per node (recommended):  362.0 GB (0.36 TB)
Disk per node @ <75%:    482.7 GB (0.48 TB)

§💡 Interpretation

  • Base (primaries + replicas) — total indexed data size on disk.
  • Merge overhead — extra space required during Lucene segment merges.
  • Headroom — operational slack to avoid hitting high/flood-stage watermarks.
  • Buffer per node — space required to receive the largest shard during relocation.
  • Target utilization — desired maximum disk usage (usually 70–80%).

This model provides an approximate but operationally safe estimation for capacity planning in Elasticsearch clusters.


§🧮 Example Scenario

  • 5 data nodes
  • 10 primary shards
  • 1 replica per shard
  • 50 GB average shard size
  • 20% merge overhead
  • 30% headroom
  • 50 GB relocation buffer per node
  • 75% target disk utilization

Result: ~1.8 TB total cluster capacity, or ~480–500 GB per node to stay below 75% usage.


§⚙️ Operational Notes

  • The results refer to disk usage, not JVM heap or RAM.

  • Typical Elasticsearch node sizing guidelines:

    • JVM heap ≤ 30 GB
    • Node memory ≥ 64 GB (≈ 50% heap, 50% OS file cache)
    • Shard size 20–50 GB
  • The model aligns with Elastic’s published best practices.


§⚠️ Limitations

  • Assumes uniform shard sizes and compression ratios.
  • Does not include local snapshots or external repository overhead.
  • Does not model “cold” or “frozen” tiers.
  • Merge and headroom factors are static (simplified estimation).

§🧰 Usage Examples

# Default example
cargo run -- \
  --nodes 5 \
  --primaries 10 \
  --replicas 1 \
  --shard_size_gb 50 \
  --overhead_merge 0.20 \
  --headroom 0.30 \
  --target_utilization 0.75

# Two replicas, larger shards
cargo run -- --nodes 5 --primaries 10 --replicas 2 --shard_size_gb 80

# Conservative watermarks
cargo run -- --nodes 6 --headroom 0.4 --target_utilization 0.65

§🧩 Future Improvements

  • Add --units iec to switch between GB/TB (1000-based) and GiB/TiB (1024-based)
  • Support JSON/CSV output for pipeline integration
  • Optional retrieval of real shard stats from _cat/shards via REST API
  • Integrate into a monitoring workflow for continuous capacity validation

Re-exports§

pub use planner::plan_capacity;
pub use planner::Plan;

Modules§

planner