udp-relay-core 0.1.0

# udp-relay-core

[![Crates.io](https://img.shields.io/crates/v/udp-relay-core.svg)](https://crates.io/crates/udp-relay-core)
[![Documentation](https://img.shields.io/docsrs/udp-relay-core)](https://docs.rs/udp-relay-core)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

Lock-free, cache-line-aligned building blocks for high-performance UDP relay and tunnel servers.

## Why this crate?

If you're building a UDP relay, game server proxy, VPN tunnel, or any service that forwards packets between many clients at high throughput, you need per-client state that is:

- **Fast** — no mutexes on the hot path; bare atomics only.
- **Concurrent** — safely shared across threads without contention.
- **Observable** — bandwidth, latency, packet loss, and priority scores available at a glance.

`udp-relay-core` gives you exactly that in a small, dependency-light package. You bring your own I/O layer (tokio, mio, io-uring, etc.) and plug these types in.

## Use cases

- **Game server relays** — track per-player bandwidth and timeout idle clients.
- **VPN / tunnel endpoints** — monitor connection quality and prioritize traffic.
- **Media streaming proxies** — detect slow or lossy paths and react in real time.
- **Load balancers** — score backends by measured latency and loss, not just round-robin.

## What's inside

| Type / Function | What it does |
|---|---|
| `TunnelClient` | Per-client state: timeout tracking, bandwidth estimation (EWMA), latency, packet loss, lazy priority scoring. Cache-line-aligned with hot/cold field separation. |
| `QualityAnalyzer` | Lightweight packet loss tracker with automatic counter halving to prevent overflow. |
| `validate_address()` | Rejects loopback, unspecified, broadcast, multicast, and port-0 addresses. |
| `create_dashmap_with_capacity()` | Creates a `DashMap` with a shard count tuned to your CPU count. |

## Getting started

Add to your `Cargo.toml`:

```toml
[dependencies]
udp-relay-core = "0.1"
```

### Track clients

```rust
use std::sync::Arc;
use udp_relay_core::{TunnelClient, create_dashmap_with_capacity};

let clients = create_dashmap_with_capacity::<u32, Arc<TunnelClient>>(200);

let addr = "1.2.3.4:5678".parse().unwrap();
let client = Arc::new(TunnelClient::new_with_endpoint(addr, 30));
clients.insert(42, client.clone());

// On every received packet
let now = TunnelClient::current_timestamp();
client.update_stats(512, 0, now);
client.set_last_receive_tick_at(now);

// Periodic cleanup
if client.is_timed_out() {
    clients.remove(&42);
}
```

### Monitor bandwidth and priority

```rust
use udp_relay_core::TunnelClient;

let client = TunnelClient::new(60);

// Feed traffic over time
let t0 = TunnelClient::current_timestamp();
client.update_stats(10_000, 5_000, t0);
client.update_stats(10_000, 5_000, t0 + 1);

// Report quality metrics from your ping/probe logic
client.set_latency(25);         // 25 ms
client.set_packet_loss_rate(50); // 5.0%

// Priority is recomputed lazily — only when metrics change
let _score = client.get_priority(); // lower = better
```

### Track packet loss

```rust
use udp_relay_core::QualityAnalyzer;

let qa = QualityAnalyzer::new();

qa.record_packet(false); // received OK
qa.record_packet(true);  // lost

// Returns 0–1000 (i.e. 0.0%–100.0%)
let _rate = qa.get_packet_loss_rate();
```

### Fast timestamps for event loops

For the lowest-overhead timestamp access, call `update_clock()` once per event-loop tick and use `recent_timestamp()` everywhere else:

```rust
use udp_relay_core::TunnelClient;

// Once per tick
TunnelClient::update_clock();

// Zero-overhead read (single atomic load)
let _now = TunnelClient::recent_timestamp();
```

## Performance notes

- All shared state uses bare atomics — no `Mutex`, no `RwLock`.
- `TunnelClient` is `Send + Sync` and designed to sit behind `Arc` in a `DashMap`.
- Hot-path fields (packet receive, bandwidth) are grouped on the first cache line; cold fields (priority, connection age) on the second — minimizing false sharing.
- Timestamps use [`coarsetime`](https://crates.io/crates/coarsetime) (`CLOCK_MONOTONIC_COARSE` on Linux), which is 10–25x faster than `SystemTime::now()`.
- Bandwidth estimation uses an exponentially weighted moving average (EWMA) so the value adapts quickly without storing a history buffer.

## License

[MIT](LICENSE)