agent-kernel 0.3.2

Agent lifecycle kernel for MXP: registration, discovery, heartbeat, and message handling
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
## MXP Agents Runtime SDK

[![Crates.io](https://img.shields.io/crates/v/mxp-agents.svg)](https://crates.io/crates/mxp-agents)
[![Docs.rs](https://docs.rs/mxp-agents/badge.svg)](https://docs.rs/mxp-agents)
[![License](https://img.shields.io/crates/l/mxp-agents.svg)](https://github.com/yafatek/mxpnexus/blob/main/LICENSE-MIT)
[![Rust 1.85+](https://img.shields.io/badge/rust-1.85%2B-orange.svg)](https://www.rust-lang.org)

**Production-grade Rust SDK for building autonomous AI agents that communicate over the [MXP protocol](https://github.com/yafatek/mxpnexus).**

Part of the MXP (Mesh eXchange Protocol) ecosystem, this SDK provides the runtime infrastructure for building, deploying, and operating AI agents that speak MXP natively. While the [`mxp`](https://crates.io/crates/mxp) crate handles wire protocol encoding/decoding and secure UDP transport, this SDK provides:

- **Agent lifecycle management** with deterministic state machines
- **MXP message handling** for Call, Response, Event, and Stream messages
- **Registry integration** for mesh discovery and heartbeats
- **LLM adapters** for OpenAI, Anthropic, Gemini, and Ollama
- **Enterprise features** including resilience, observability, and security

## Table of Contents

- [Quick Start]#quick-start
- [Enterprise-Grade Capabilities]#enterprise-grade-capabilities
- [Why MXP Agents Runtime]#why-it-exists
- [Scope]#scope
- [Production Readiness]#production-readiness
- [Documentation]#documentation-map
- [Examples]#examples
- [Getting Started]#getting-started
- [Requirements]#requirements
- [Contributing]#contributing
- [License]#license

## Quick Start

Install via the bundled facade crate:

```sh
cargo add mxp-agents
```

**Basic LLM Usage**

```rust
use mxp_agents::adapters::ollama::{OllamaAdapter, OllamaConfig};
use mxp_agents::adapters::traits::{InferenceRequest, MessageRole, ModelAdapter, PromptMessage};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create an adapter (works with OpenAI, Anthropic, Gemini, or Ollama)
    // Use .with_stream(true) for incremental token streaming
    let adapter = OllamaAdapter::new(
        OllamaConfig::new("gemma2:2b")
            .with_stream(true)  // Enable streaming responses
    )?;

    // Build a request with system prompt
    let request = InferenceRequest::new(vec![
        PromptMessage::new(MessageRole::User, "What is MXP?"),
    ])?
    .with_system_prompt("You are an expert on MXP protocol")
    .with_temperature(0.7);

    // Get streaming response
    let mut stream = adapter.infer(request).await?;
    
    // Process chunks as they arrive
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        print!("{}", chunk.delta);
    }
    
    Ok(())
}
```

**MXP Agent Setup**

Agents communicate over the MXP protocol. Here's how to create an agent that handles MXP messages:

```rust
use mxp_agents::kernel::{
    AgentKernel, AgentMessageHandler, HandlerContext, HandlerResult,
    TaskScheduler, LifecycleEvent,
};
use mxp_agents::primitives::{AgentId, AgentManifest, Capability, CapabilityId};
use async_trait::async_trait;
use std::sync::Arc;

// Define your agent's message handler
struct MyAgentHandler;

#[async_trait]
impl AgentMessageHandler for MyAgentHandler {
    async fn handle_call(&self, ctx: HandlerContext) -> HandlerResult {
        // Process incoming MXP Call messages
        let message = ctx.message();
        println!("Received MXP call with {} bytes", message.payload().len());
        Ok(())
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let agent_id = AgentId::random();
    let handler = Arc::new(MyAgentHandler);
    let scheduler = TaskScheduler::default();

    // Create the agent kernel
    let mut kernel = AgentKernel::new(agent_id, handler, scheduler);

    // Boot and activate the agent
    kernel.transition(LifecycleEvent::Boot)?;
    kernel.transition(LifecycleEvent::Activate)?;

    println!("Agent {} is active and ready for MXP messages", agent_id);
    Ok(())
}
```

**Production Setup with Resilience & Observability**

```rust
use mxp_agents::adapters::ollama::{OllamaAdapter, OllamaConfig};
use mxp_agents::adapters::resilience::{
    CircuitBreakerConfig, RetryConfig, BackoffStrategy, ResilientAdapter,
};
use mxp_agents::telemetry::PrometheusExporter;
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create resilient adapter with circuit breaker and retry
    let base_adapter = OllamaAdapter::new(OllamaConfig::new("gemma2:2b"))?;
    let resilient = ResilientAdapter::builder(base_adapter)
        .with_circuit_breaker(CircuitBreakerConfig {
            failure_threshold: 5,
            cooldown: Duration::from_secs(30),
            success_threshold: 2,
        })
        .with_retry(RetryConfig {
            max_attempts: 3,
            backoff: BackoffStrategy::Exponential {
                base: Duration::from_millis(100),
                max: Duration::from_secs(10),
                jitter: true,
            },
            ..Default::default()
        })
        .with_timeout_duration(Duration::from_secs(30))
        .build();

    // Set up metrics collection
    let exporter = PrometheusExporter::new();
    let _ = exporter.register_runtime();
    let _ = exporter.register_adapter("ollama");

    // Export Prometheus metrics
    println!("{}", exporter.export());

    Ok(())
}
```

See [examples/](examples/) for more complete examples including policy enforcement, memory integration, and graceful shutdown.

### Enterprise-Grade Capabilities

The SDK is production-hardened with features required for mission-critical deployments:

**Resilience & Reliability**
- Circuit breaker pattern prevents cascading failures
- Exponential backoff retry policies with jitter
- Request timeout enforcement with per-request overrides
- Automatic recovery from transient failures

**Observability & Monitoring**
- Prometheus-compatible metrics (latency, throughput, error rates, queue depth)
- Health checks for Kubernetes readiness/liveness probes
- OpenTelemetry trace propagation for distributed tracing
- Structured logging with correlation IDs

**Security & Compliance**
- Secrets management with redacted Debug output
- Per-agent rate limiting with token bucket algorithm
- Input validation with configurable size limits
- Audit events for policy enforcement and compliance

**Operations & Configuration**
- Layered configuration from defaults, files, environment, and runtime
- Hot reload for non-disruptive configuration updates
- Configuration digest for drift detection
- Graceful shutdown with in-flight work draining
- State recovery and checkpoint persistence

### Why it exists

- Provide a unified runtime that wraps LLMs, tools, memory, and governance without depending on QUIC or third-party transports.
- Ensure every agent built for MXP Nexus speaks MXP natively and adheres to platform security, observability, and performance rules.
- Offer a developer-friendly path to compose agents locally, then promote them into the MXP Nexus platform when ready.
- Enable production deployments with enterprise-grade resilience, observability, and security out of the box.

### Scope

**Core Runtime**
- Agent lifecycle management with deterministic state machine
- LLM connectors (OpenAI, Anthropic, Gemini, Ollama, MXP-hosted)
- Tool registration with capability-based access control
- Policy hooks for governance and compliance
- MXP message handling and protocol integration
- Memory integration (volatile cache, file journal, vector store interfaces)

**Enterprise Features**
- Resilience patterns (circuit breaker, retry, timeout)
- Observability (Prometheus metrics, health checks, distributed tracing)
- Security (secrets management, rate limiting, input validation)
- Configuration management (layered config, hot reload, validation)
- Graceful lifecycle (shutdown coordination, state recovery)

**Out of scope**: MXP Nexus deployment tooling, mesh scheduling, or any "deep agents" research-oriented SDK—handled by separate projects.

### Supported LLM stacks

- OpenAI, Anthropic, Gemini, Ollama, and future MXP-hosted models via a shared `ModelAdapter` trait.

### Production Readiness

The SDK is designed for production deployments with:

- **Zero-allocation hot paths** in call execution and scheduler loops
- **Comprehensive error handling** with exhaustive error types
- **Property-based testing** for correctness verification
- **Kubernetes integration** with health checks and graceful shutdown
- **Observability** with structured logging, metrics, and distributed tracing
- **Security** with secrets management, rate limiting, and input validation

All code passes `cargo fmt`, `cargo clippy --all-targets --all-features`, and `cargo test --all-features` gates.

### MXP integration

This SDK is part of the [MXP protocol](https://github.com/yafatek/mxpnexus) ecosystem. The `mxp` crate provides the transport primitives, while this SDK provides the agent runtime that speaks MXP natively.

**Protocol Relationship**
- `mxp` crate: Wire protocol, message encoding/decoding, UDP transport with ChaCha20-Poly1305 encryption
- `mxp-agents` crate: Agent runtime, lifecycle management, LLM adapters, tools, policy enforcement

**MXP Message Types**
Agents handle these MXP message types through the `AgentMessageHandler` trait:
- `AgentRegister` / `AgentHeartbeat` — Mesh registration and health
- `Call` / `Response` — Request-response communication
- `Event` — Fire-and-forget notifications
- `StreamOpen` / `StreamChunk` / `StreamClose` — Streaming data

**Registry Integration Example**

```rust
use mxp_agents::kernel::{
    AgentKernel, MxpRegistryClient, RegistrationConfig, TaskScheduler,
};
use mxp_agents::primitives::{AgentId, AgentManifest, Capability, CapabilityId};
use std::net::SocketAddr;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let agent_id = AgentId::random();
    
    // Define agent capabilities
    let capability = Capability::builder(CapabilityId::new("chat.respond")?)
        .name("Chat Response")?
        .version("1.0.0")?
        .add_scope("chat:write")?
        .build()?;

    // Create agent manifest
    let manifest = AgentManifest::builder(agent_id)
        .name("my-chat-agent")?
        .version("0.1.0")?
        .capabilities(vec![capability])
        .build()?;

    // Connect to MXP registry for mesh discovery
    let agent_endpoint: SocketAddr = "127.0.0.1:50052".parse()?;
    let registry = Arc::new(MxpRegistryClient::connect(
        "127.0.0.1:50051",  // Registry endpoint
        agent_endpoint,
        None,
    )?);

    // Create kernel with registry integration
    let handler = Arc::new(MyAgentHandler);
    let mut kernel = AgentKernel::new(agent_id, handler, TaskScheduler::default());
    kernel.set_registry(registry, manifest, RegistrationConfig::default());

    // Agent will auto-register and send heartbeats
    kernel.transition(LifecycleEvent::Boot)?;
    kernel.transition(LifecycleEvent::Activate)?;

    Ok(())
}
```

### Key concepts

- Tools are pure Rust functions annotated with `#[tool]`; the SDK converts them into schemas consumable by LLMs and enforces capability scopes at runtime.
- Agents can share external state (memory bus, MXP Vector Store) or remain fully isolated.
- Governance and policy enforcement are first-class: hooks exist for allow/deny decisions and human-in-the-loop steps.

### System Prompts

All adapters support system prompts with provider-native optimizations:

```rust
use mxp_agents::adapters::openai::{OpenAiAdapter, OpenAiConfig};
use mxp_agents::adapters::anthropic::{AnthropicAdapter, AnthropicConfig};
use mxp_agents::adapters::gemini::{GeminiAdapter, GeminiConfig};
use mxp_agents::adapters::traits::InferenceRequest;

// OpenAI/Ollama: Prepends as first message
let openai = OpenAiAdapter::new(OpenAiConfig::from_env("gpt-4"))?;

// Anthropic: Uses dedicated 'system' parameter
let anthropic = AnthropicAdapter::new(AnthropicConfig::from_env("claude-3-5-sonnet-20241022"))?;

// Gemini: Uses 'systemInstruction' field
let gemini = GeminiAdapter::new(GeminiConfig::from_env("gemini-1.5-pro"))?;

// Same API works across all providers
let request = InferenceRequest::new(messages)?
    .with_system_prompt("You are a helpful assistant");
```

### Context Window Management (Optional)

For long conversations, enable automatic context management:

```rust
use mxp_agents::prompts::ContextWindowConfig;
use mxp_agents::adapters::ollama::{OllamaAdapter, OllamaConfig};

let adapter = OllamaAdapter::new(OllamaConfig::new("gemma2:2b"))?
    .with_context_config(ContextWindowConfig {
        max_tokens: 4096,
        recent_window_size: 10,
        ..Default::default()
    });

// SDK automatically manages conversation history within token budget
```

### Documentation Map

- `docs/overview.md` — architectural overview and design principles
- `docs/architecture.md` — crate layout, component contracts, roadmap
- `docs/features.md` — complete feature set and facade feature flags
- `docs/usage.md` — end-to-end setup guide for building agents
- `docs/enterprise.md` — production hardening guide with resilience, observability, and security
- `docs/errors.md` — error surfaces and troubleshooting tips

### Examples

- `examples/basic-agent` — simple agent with Ollama adapter and policy enforcement
- `examples/enterprise-agent` — production-grade agent demonstrating resilience, metrics, health checks, and graceful shutdown

### Getting Started

1. **Development**: Start with `examples/basic-agent` to understand core concepts
2. **Production**: Review `docs/enterprise.md` and `examples/enterprise-agent` for hardening patterns
3. **Integration**: Wire MXP endpoints for discovery and message handling
4. **Deployment**: Use health checks and metrics for Kubernetes integration


### Performance & Reliability

- **Sub-microsecond message encoding/decoding** via MXP protocol
- **Lock-free data structures** for high-concurrency scenarios
- **Bounded memory usage** with configurable limits
- **Automatic recovery** from transient failures
- **Graceful degradation** under load with rate limiting
- **Comprehensive testing** with property-based tests for correctness

### Security

- **Secrets management** with redacted Debug output
- **Rate limiting** to prevent resource exhaustion
- **Input validation** with configurable constraints
- **Audit events** for compliance and governance
- **Capability-based access control** for tools
- **Policy enforcement** with allow/deny/escalate decisions

### Observability

- **Prometheus metrics** for monitoring and alerting
- **Health checks** for Kubernetes integration
- **Distributed tracing** with OpenTelemetry support
- **Structured logging** with correlation IDs
- **Circuit breaker state tracking** for failure visibility
- **Request latency histograms** for performance analysis

## Requirements

- **Rust**: 1.85 or later (MSRV)
- **Tokio**: Async runtime (included via dependencies)
- **Optional**: Ollama, OpenAI, Anthropic, or Gemini API keys for LLM adapters

## Troubleshooting

### Circuit Breaker Opens Frequently

If the circuit breaker is opening too often:
- Increase `failure_threshold` in `CircuitBreakerConfig`
- Check provider status and connectivity
- Review timeout settings

### High Memory Usage

If memory usage is growing:
- Enable metrics cardinality limiting
- Check configuration hot reload is working
- Review rate limiter cleanup

### Slow Inference

If inference is slower than expected:
- Check `request_latency_seconds` metrics
- Verify provider API status
- Review retry and timeout configuration

See [docs/enterprise.md](docs/enterprise.md) for comprehensive troubleshooting guide.

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Development

```bash
# Build all crates
cargo build --all-features

# Run tests
cargo test --all-features

# Run linting
cargo clippy --all-targets --all-features -- -D warnings

# Format code
cargo fmt --check
```

## Community

- **GitHub Issues**: [Report bugs or request features]https://github.com/yafatek/mxpnexus/issues
- **GitHub Discussions**: [Ask questions and discuss ideas]https://github.com/yafatek/mxpnexus/discussions
- **Documentation**: [Full API docs]https://docs.rs/mxp-agents

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

## Acknowledgments

Built with [Rust](https://www.rust-lang.org), [Tokio](https://tokio.rs), and the MXP protocol specification.