blvm-node 0.1.16

Bitcoin Commons BLVM: Minimal Bitcoin node implementation using blvm-protocol and blvm-consensus
# High Availability Features

## Overview

blvm-node implements Phase 2 and 3 high availability features for production deployment: Prometheus metrics export, health check endpoints, disk space monitoring, peer reconnection, enhanced rate limiting, and structured logging.

## Metrics Endpoint

### Prometheus Metrics Export

**Endpoint**: `GET /metrics`

**Purpose**: Exports Prometheus-formatted metrics for monitoring.

**Metrics Exported**:
- Block processing metrics (blocks processed, validation time)
- Network metrics (peers connected, bytes sent/received)
- Storage metrics (database size, UTXO count)
- RPC metrics (requests processed, errors)
- Mempool metrics (transaction count, size)

**Example**:
```bash
curl http://localhost:18332/metrics
```

**Response Format**: Prometheus text format

**Usage**: Configure Prometheus to scrape this endpoint for monitoring dashboards.

---

## Health Check Endpoints

### Basic Health Check

**Endpoint**: `GET /health`

**Purpose**: Simple health check for load balancers.

**Response**:
```json
{
  "status": "healthy",
  "timestamp": 1234567890
}
```

**Status Codes**:
- `200 OK`: Node is healthy
- `503 Service Unavailable`: Node is unhealthy

---

### Liveness Probe

**Endpoint**: `GET /health/live`

**Purpose**: Kubernetes liveness probe - indicates if node process is running.

**Response**:
```json
{
  "status": "alive"
}
```

**Status Codes**:
- `200 OK`: Process is alive
- `503 Service Unavailable`: Process is dead/unresponsive

---

### Readiness Probe

**Endpoint**: `GET /health/ready`

**Purpose**: Kubernetes readiness probe - indicates if node is ready to serve requests.

**Response**:
```json
{
  "status": "ready",
  "chain_initialized": true,
  "storage_available": true
}
```

**Status Codes**:
- `200 OK`: Node is ready
- `503 Service Unavailable`: Node is not ready (e.g., initializing chain)

---

### Detailed Health Check

**Endpoint**: `GET /health/detailed`

**Purpose**: Comprehensive health status for debugging.

**Response**:
```json
{
  "status": "healthy",
  "chain": {
    "initialized": true,
    "height": 123456,
    "tip_hash": "0000..."
  },
  "storage": {
    "available": true,
    "size_bytes": 1234567890
  },
  "network": {
    "peers_connected": 8,
    "peers_max": 100
  },
  "rpc": {
    "enabled": true,
    "requests_processed": 12345
  }
}
```

---

## Disk Space Monitoring

Blockchain storage growth is handled primarily via **`[storage.pruning]`** (see [`PruningConfig`](https://github.com/BTCDecoded/blvm-node/blob/main/src/config/storage.rs)). There is **no** `pruning_threshold_gb` / `pruning_target_gb` on `StorageConfig` in this tree.

**Example (normal pruning)**:

```toml
[storage]
data_dir = "/var/lib/blvm"
database_backend = "auto"

[storage.pruning]
mode = { type = "normal", keep_from_height = 0, min_recent_blocks = 288 }
auto_prune = true
min_blocks_to_keep = 144
```

**Behavior**: Depends on pruning mode (Normal / Aggressive / Disabled / Custom); see blvm-docs pruning section and feature gates such as **`utxo-commitments`** for aggressive paths.

---

## Peer Reconnection

The node runs periodic background work including **peer reconnection** intervals (see [`BackgroundTaskConfig`](https://github.com/BTCDecoded/blvm-node/blob/main/src/config/ibd.rs) on `NodeConfig`).

**Configurable interval** (optional):

```toml
[background_tasks]
peer_reconnection_interval_secs = 10
```

There is **no** `[network] reconnect_*` block in `NodeConfig`; reconnection policy is implemented inside the network stack, not via those placeholder keys.

---

## Rate Limiting

RPC rate limits use **`[rpc]`** (IP / connection limits when auth is off) and **`[rpc_auth]`** (token-bucket burst/rate). There is **no** `[rpc.auth]` or `per_method_limits` table in `NodeConfig`.

```toml
[rpc]
rate_limit_when_auth_disabled = true
ip_rate_limit_burst = 50
ip_rate_limit_rate = 5
max_connections_per_ip_per_minute = 10

[rpc_auth]
rate_limit_burst = 100
rate_limit_rate = 10
```

See [`RpcConfig`](https://github.com/BTCDecoded/blvm-node/blob/main/src/config/rpc.rs) and [`RpcAuthConfig`](https://github.com/BTCDecoded/blvm-node/blob/main/src/config/rpc.rs).

---

## Structured Logging

### Request IDs and Tracing

blvm-node uses structured logging with request IDs and tracing spans.

**Features**:
- Request IDs: Unique ID per RPC request
- Tracing spans: Hierarchical tracing context
- Request/response metrics: Logged with each request
- Client address tracking: Logged for each request

**Log Format**:
```
[2025-01-01T00:00:00Z INFO rpc_request] request_id=abc12345 method=getblockhash client_addr=127.0.0.1:12345 request_size=123
```

**Configuration** ([`LoggingConfig`](https://github.com/BTCDecoded/blvm-node/blob/main/src/config/mod.rs)):

```toml
[logging]
filter = "info"   # or use key alias: level = "info"
json_format = true
```

---

## Configuration

### Example: knobs that exist on `NodeConfig`

```toml
listen_addr = "0.0.0.0:8333"
max_peers = 100
transport_preference = "tcponly"

[background_tasks]
peer_reconnection_interval_secs = 10

[storage]
data_dir = "/var/lib/blvm"
database_backend = "auto"

[storage.pruning]
mode = { type = "normal", keep_from_height = 0, min_recent_blocks = 288 }
 
[rpc]
ip_rate_limit_burst = 50
ip_rate_limit_rate = 5

[rpc_auth]
rate_limit_burst = 100
rate_limit_rate = 10

[logging]
filter = "info"
json_format = true
```

Metrics and `/health*` paths depend on the running **`blvm`** / RPC stack build (feature set). Treat endpoint availability as **implementation-defined** and verify against your binary.

---

## Monitoring Setup

### Prometheus Configuration

```yaml
scrape_configs:
  - job_name: 'blvm-node'
    static_configs:
      - targets: ['localhost:18332']
    metrics_path: '/metrics'
    scrape_interval: 15s
```

### Health Check Configuration

**Kubernetes**:
```yaml
livenessProbe:
  httpGet:
    path: /health/live
    port: 18332
  initialDelaySeconds: 30
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /health/ready
    port: 18332
  initialDelaySeconds: 10
  periodSeconds: 5
```

**Load Balancer**:
- Health check endpoint: `/health`
- Health check interval: 10 seconds
- Unhealthy threshold: 3 failures

---

## Related Documentation

- [RPC Reference]RPC_REFERENCE.md - Complete RPC API
- [Configuration Guide]CONFIGURATION_GUIDE.md - Node configuration (this repo)
- [Production mainnet node]https://docs.thebitcoincommons.org/getting-started/first-node.html#production-mainnet-node ([BLVM Documentation]https://docs.thebitcoincommons.org/)