# OMNI-MESH
**Zero-allocation mesh networking middleware for autonomous robot fleets, edge-AI swarms, and multi-agent systems.**
Written in Rust. Cryptographically signed. Production-ready.
---
## What is OMNI-MESH?
OMNI-MESH is a decentralized peer-to-peer messaging layer designed for robotics and edge-AI deployments where latency, reliability, and security matter. Every node has a cryptographic identity (DID), every message is signed, and delivery is exactly-once with ordering guarantees.
**Use cases:**
- Multi-robot fleet coordination (warehouse, logistics, agriculture)
- Edge-AI inference mesh (distributed LLM, sensor fusion)
- Autonomous vehicle V2V communication
- Industrial IoT with real-time constraints
- Federated learning model distribution
## Key Features
| Ed25519 Identity | Every node has a DID derived from its Ed25519 public key |
| Signed Envelopes | All messages are cryptographically signed and verified |
| Exactly-Once Delivery | Deduplication + ordered delivery with gap buffering |
| Pluggable Transport | Mock (testing), TCP (lightweight), QUIC (production) |
| Zero-Allocation Hot Path | Fixed-size buffers, no heap allocation in message pipeline |
| Gossip Routing | Decentralized peer discovery via UDP gossip protocol |
| DTN Support | Delay-Tolerant Networking for intermittent connectivity |
| WCET Enforcement | Worst-Case Execution Time guards with CPU pinning |
| Prometheus Metrics | Full observability with latency histograms |
| Python Bindings | PyO3-based SDK for ML/robotics teams |
| Graceful Shutdown | Signal handling, drain queues, clean exit |
| Health Checks | Built-in liveness probes for orchestration |
## Quick Start
```rust
use omnimesh::client::{OmnimeshClient, ClientConfig};
use omnimesh::payload;
// Create two nodes
let robot = OmnimeshClient::builder()
.with_config(ClientConfig::development())
.build()
.expect("Failed to build robot");
let controller = OmnimeshClient::builder()
.with_config(ClientConfig::development())
.build()
.expect("Failed to build controller");
// Send a motion command
let cmd = payload::motion_command(1.0, 0.0, 0.0, 0.0, 0.0, 0.5, 100_000);
controller.send(robot.did, cmd).unwrap();
// Receive and process
if let Some(msg) = robot.receive_timeout(Duration::from_secs(1)) {
println!("Received: {:?}", msg.payload);
}
// Health monitoring
let health = robot.health();
assert!(health.is_healthy());
```
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ Developer SDK │
│ OmnimeshClient (send/receive/health/shutdown) │
├─────────────────────────────────────────────────────┤
│ Security Layer │
│ Ed25519 signing + verification (mode-dependent) │
├─────────────────────────────────────────────────────┤
│ Delivery Layer │
│ Exactly-once dedup + ordered delivery + DTN │
├─────────────────────────────────────────────────────┤
│ Transport Layer │
│ Routing Layer │
│ DID→SocketAddr table + UDP gossip discovery │
├─────────────────────────────────────────────────────┤
│ Buffer Layer │
│ Zero-alloc: RingBuffer, FixedMap, PayloadStorage │
└─────────────────────────────────────────────────────┘
```
## Operational Modes
| Development | Mock (in-process) | Optional | Best-effort | Testing, CI |
| Lightweight | TCP | Minimal | Lightweight | Embedded, constrained |
| Production | TCP/QUIC | Required | Reliable + DTN | Fleet deployment |
## Running
```bash
# Build
cargo build --release
# Run daemon
cargo run --release -- --config omni-mesh.toml
# Run examples
cargo run --example ping_pong
cargo run --example warehouse_fleet
# Run tests (130 tests)
cargo test
# Run benchmarks
cargo bench
```
## Python SDK
```python
import omnimesh
# Create a client
client = omnimesh.Client(mode="development")
print(f"My DID: {client.did}")
# Send a command
client.send_agent_command(
target_did_hex="<64-char-hex>",
command_type="pick",
target_id=b"robot-1",
payload=b"shelf-A12"
)
# Receive messages
msg = client.receive(timeout_ms=5000)
if msg:
print(f"Got {msg['type']} from {msg['sender_did']}")
```
## Configuration
```toml
# omni-mesh.toml
[core]
mode = "production"
[node]
node_id = "node-1"
[node.transport]
type = "tcp"
tcp_listen_addr = "0.0.0.0:9000"
tcp_connect_addr = "127.0.0.1:9001"
quic_listen_addr = "0.0.0.0:9443"
[routing]
max_routes = 1024
gossip_interval_ms = 1000
gossip_bind_addr = "0.0.0.0:9999"
```
## Production Deployment Checklist
- [x] Graceful shutdown via Ctrl+C / SIGTERM
- [x] Health check API (`client.health()`)
- [x] Back-pressure with configurable inbox capacity
- [x] Metrics: messages sent/received/dropped
- [x] Condvar-based efficient waiting (no busy-polling)
- [x] Named threads for debugging
- [x] Structured JSON logging
- [x] Ed25519 signature enforcement in production mode
- [x] Exactly-once delivery with persistent deduplication
- [x] 130 tests including concurrency, crash recovery, and edge cases
- [x] Multi-OS CI (Linux, Windows, macOS)
- [x] Security audit in CI pipeline
- [x] Code coverage reporting
## Test Coverage
| Unit tests (lib) | 58 | Core logic, buffer, envelope, payload |
| Buffer tests | 2 | Fixed-size data structures |
| Crash recovery | 7 | Restart resilience, memory pressure |
| Delivery | 1 | Ordered delivery pipeline |
| Envelope | 2 | Serialization roundtrip |
| Flow control | 6 | Back-pressure, rate limiting |
| Integration | 3 | Multi-layer pipeline |
| Live network | 3 | TCP/QUIC transport |
| Multi-node | 6 | Fleet coordination |
| Persistent dedup | 3 | DTN store deduplication |
| Production edge cases | 28 | Shutdown, concurrency, crypto, overflow |
| SDK | 7 | Client API surface |
| Security | 2 | Signature verification |
| Storage | 2 | Persistent storage layer |
## Contributing
PRs welcome. Run `cargo test` and `cargo clippy` before submitting.