agent-kernel 0.3.2

## MXP Agents Runtime SDK

[![Crates.io](https://img.shields.io/crates/v/mxp-agents.svg)](https://crates.io/crates/mxp-agents)
[![Docs.rs](https://docs.rs/mxp-agents/badge.svg)](https://docs.rs/mxp-agents)
[![License](https://img.shields.io/crates/l/mxp-agents.svg)](https://github.com/yafatek/mxpnexus/blob/main/LICENSE-MIT)
[![Rust 1.85+](https://img.shields.io/badge/rust-1.85%2B-orange.svg)](https://www.rust-lang.org)

**Production-grade Rust SDK for building autonomous AI agents that communicate over the [MXP protocol](https://github.com/yafatek/mxpnexus).**

Part of the MXP (Mesh eXchange Protocol) ecosystem, this SDK provides the runtime infrastructure for building, deploying, and operating AI agents that speak MXP natively. While the [`mxp`](https://crates.io/crates/mxp) crate handles wire protocol encoding/decoding and secure UDP transport, this SDK provides:

- **Agent lifecycle management** with deterministic state machines
- **MXP message handling** for Call, Response, Event, and Stream messages
- **Registry integration** for mesh discovery and heartbeats
- **LLM adapters** for OpenAI, Anthropic, Gemini, and Ollama
- **Enterprise features** including resilience, observability, and security

## Table of Contents

- [Quick Start](#quick-start)
- [Enterprise-Grade Capabilities](#enterprise-grade-capabilities)
- [Why MXP Agents Runtime](#why-it-exists)
- [Scope](#scope)
- [Production Readiness](#production-readiness)
- [Documentation](#documentation-map)
- [Examples](#examples)
- [Getting Started](#getting-started)
- [Requirements](#requirements)
- [Contributing](#contributing)
- [License](#license)

## Quick Start

Install via the bundled facade crate:

```sh
cargo add mxp-agents
```

**Basic LLM Usage**

```rust
use mxp_agents::adapters::ollama::{OllamaAdapter, OllamaConfig};
use mxp_agents::adapters::traits::{InferenceRequest, MessageRole, ModelAdapter, PromptMessage};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create an adapter (works with OpenAI, Anthropic, Gemini, or Ollama)
    // Use .with_stream(true) for incremental token streaming
    let adapter = OllamaAdapter::new(
        OllamaConfig::new("gemma2:2b")
            .with_stream(true)  // Enable streaming responses
    )?;

    // Build a request with system prompt
    let request = InferenceRequest::new(vec![
        PromptMessage::new(MessageRole::User, "What is MXP?"),
    ])?
    .with_system_prompt("You are an expert on MXP protocol")
    .with_temperature(0.7);

    // Get streaming response
    let mut stream = adapter.infer(request).await?;
    
    // Process chunks as they arrive
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        print!("{}", chunk.delta);
    }
    
    Ok(())
}
```

**MXP Agent Setup**

Agents communicate over the MXP protocol. Here's how to create an agent that handles MXP messages:

```rust
use mxp_agents::kernel::{
    AgentKernel, AgentMessageHandler, HandlerContext, HandlerResult,
    TaskScheduler, LifecycleEvent,
};
use mxp_agents::primitives::{AgentId, AgentManifest, Capability, CapabilityId};
use async_trait::async_trait;
use std::sync::Arc;

// Define your agent's message handler
struct MyAgentHandler;

#[async_trait]
impl AgentMessageHandler for MyAgentHandler {
    async fn handle_call(&self, ctx: HandlerContext) -> HandlerResult {
        // Process incoming MXP Call messages
        let message = ctx.message();
        println!("Received MXP call with {} bytes", message.payload().len());
        Ok(())
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let agent_id = AgentId::random();
    let handler = Arc::new(MyAgentHandler);
    let scheduler = TaskScheduler::default();

    // Create the agent kernel
    let mut kernel = AgentKernel::new(agent_id, handler, scheduler);

    // Boot and activate the agent
    kernel.transition(LifecycleEvent::Boot)?;
    kernel.transition(LifecycleEvent::Activate)?;

    println!("Agent {} is active and ready for MXP messages", agent_id);
    Ok(())
}
```

**Production Setup with Resilience & Observability**

```rust
use mxp_agents::adapters::ollama::{OllamaAdapter, OllamaConfig};
use mxp_agents::adapters::resilience::{
    CircuitBreakerConfig, RetryConfig, BackoffStrategy, ResilientAdapter,
};
use mxp_agents::telemetry::PrometheusExporter;
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create resilient adapter with circuit breaker and retry
    let base_adapter = OllamaAdapter::new(OllamaConfig::new("gemma2:2b"))?;
    let resilient = ResilientAdapter::builder(base_adapter)
        .with_circuit_breaker(CircuitBreakerConfig {
            failure_threshold: 5,
            cooldown: Duration::from_secs(30),
            success_threshold: 2,
        })
        .with_retry(RetryConfig {
            max_attempts: 3,
            backoff: BackoffStrategy::Exponential {
                base: Duration::from_millis(100),
                max: Duration::from_secs(10),
                jitter: true,
            },
            ..Default::default()
        })
        .with_timeout_duration(Duration::from_secs(30))
        .build();

    // Set up metrics collection
    let exporter = PrometheusExporter::new();
    let _ = exporter.register_runtime();
    let _ = exporter.register_adapter("ollama");

    // Export Prometheus metrics
    println!("{}", exporter.export());

    Ok(())
}
```

See [examples/](examples/) for more complete examples including policy enforcement, memory integration, and graceful shutdown.

### Enterprise-Grade Capabilities

The SDK is production-hardened with features required for mission-critical deployments:

**Resilience & Reliability**
- Circuit breaker pattern prevents cascading failures
- Exponential backoff retry policies with jitter
- Request timeout enforcement with per-request overrides
- Automatic recovery from transient failures

**Observability & Monitoring**
- Prometheus-compatible metrics (latency, throughput, error rates, queue depth)
- Health checks for Kubernetes readiness/liveness probes
- OpenTelemetry trace propagation for distributed tracing
- Structured logging with correlation IDs

**Security & Compliance**
- Secrets management with redacted Debug output
- Per-agent rate limiting with token bucket algorithm
- Input validation with configurable size limits
- Audit events for policy enforcement and compliance

**Operations & Configuration**
- Layered configuration from defaults, files, environment, and runtime
- Hot reload for non-disruptive configuration updates
- Configuration digest for drift detection
- Graceful shutdown with in-flight work draining
- State recovery and checkpoint persistence

### Why it exists

- Provide a unified runtime that wraps LLMs, tools, memory, and governance without depending on QUIC or third-party transports.
- Ensure every agent built for MXP Nexus speaks MXP natively and adheres to platform security, observability, and performance rules.
- Offer a developer-friendly path to compose agents locally, then promote them into the MXP Nexus platform when ready.
- Enable production deployments with enterprise-grade resilience, observability, and security out of the box.

### Scope

**Core Runtime**
- Agent lifecycle management with deterministic state machine
- LLM connectors (OpenAI, Anthropic, Gemini, Ollama, MXP-hosted)
- Tool registration with capability-based access control
- Policy hooks for governance and compliance
- MXP message handling and protocol integration
- Memory integration (volatile cache, file journal, vector store interfaces)

**Enterprise Features**
- Resilience patterns (circuit breaker, retry, timeout)
- Observability (Prometheus metrics, health checks, distributed tracing)
- Security (secrets management, rate limiting, input validation)
- Configuration management (layered config, hot reload, validation)
- Graceful lifecycle (shutdown coordination, state recovery)

**Out of scope**: MXP Nexus deployment tooling, mesh scheduling, or any "deep agents" research-oriented SDK—handled by separate projects.

### Supported LLM stacks

- OpenAI, Anthropic, Gemini, Ollama, and future MXP-hosted models via a shared `ModelAdapter` trait.

### Production Readiness

The SDK is designed for production deployments with:

- **Zero-allocation hot paths** in call execution and scheduler loops
- **Comprehensive error handling** with exhaustive error types
- **Property-based testing** for correctness verification
- **Kubernetes integration** with health checks and graceful shutdown
- **Observability** with structured logging, metrics, and distributed tracing
- **Security** with secrets management, rate limiting, and input validation

All code passes `cargo fmt`, `cargo clippy --all-targets --all-features`, and `cargo test --all-features` gates.

### MXP integration

This SDK is part of the [MXP protocol](https://github.com/yafatek/mxpnexus) ecosystem. The `mxp` crate provides the transport primitives, while this SDK provides the agent runtime that speaks MXP natively.

**Protocol Relationship**
- `mxp` crate: Wire protocol, message encoding/decoding, UDP transport with ChaCha20-Poly1305 encryption
- `mxp-agents` crate: Agent runtime, lifecycle management, LLM adapters, tools, policy enforcement

**MXP Message Types**
Agents handle these MXP message types through the `AgentMessageHandler` trait:
- `AgentRegister` / `AgentHeartbeat` — Mesh registration and health
- `Call` / `Response` — Request-response communication
- `Event` — Fire-and-forget notifications
- `StreamOpen` / `StreamChunk` / `StreamClose` — Streaming data

**Registry Integration Example**

```rust
use mxp_agents::kernel::{
    AgentKernel, MxpRegistryClient, RegistrationConfig, TaskScheduler,
};
use mxp_agents::primitives::{AgentId, AgentManifest, Capability, CapabilityId};
use std::net::SocketAddr;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let agent_id = AgentId::random();
    
    // Define agent capabilities
    let capability = Capability::builder(CapabilityId::new("chat.respond")?)
        .name("Chat Response")?
        .version("1.0.0")?
        .add_scope("chat:write")?
        .build()?;

    // Create agent manifest
    let manifest = AgentManifest::builder(agent_id)
        .name("my-chat-agent")?
        .version("0.1.0")?
        .capabilities(vec![capability])
        .build()?;

    // Connect to MXP registry for mesh discovery
    let agent_endpoint: SocketAddr = "127.0.0.1:50052".parse()?;
    let registry = Arc::new(MxpRegistryClient::connect(
        "127.0.0.1:50051",  // Registry endpoint
        agent_endpoint,
        None,
    )?);

    // Create kernel with registry integration
    let handler = Arc::new(MyAgentHandler);
    let mut kernel = AgentKernel::new(agent_id, handler, TaskScheduler::default());
    kernel.set_registry(registry, manifest, RegistrationConfig::default());

    // Agent will auto-register and send heartbeats
    kernel.transition(LifecycleEvent::Boot)?;
    kernel.transition(LifecycleEvent::Activate)?;

    Ok(())
}
```

### Key concepts

- Tools are pure Rust functions annotated with `#[tool]`; the SDK converts them into schemas consumable by LLMs and enforces capability scopes at runtime.
- Agents can share external state (memory bus, MXP Vector Store) or remain fully isolated.
- Governance and policy enforcement are first-class: hooks exist for allow/deny decisions and human-in-the-loop steps.

### System Prompts

All adapters support system prompts with provider-native optimizations:

```rust
use mxp_agents::adapters::openai::{OpenAiAdapter, OpenAiConfig};
use mxp_agents::adapters::anthropic::{AnthropicAdapter, AnthropicConfig};
use mxp_agents::adapters::gemini::{GeminiAdapter, GeminiConfig};
use mxp_agents::adapters::traits::InferenceRequest;

// OpenAI/Ollama: Prepends as first message
let openai = OpenAiAdapter::new(OpenAiConfig::from_env("gpt-4"))?;

// Anthropic: Uses dedicated 'system' parameter
let anthropic = AnthropicAdapter::new(AnthropicConfig::from_env("claude-3-5-sonnet-20241022"))?;

// Gemini: Uses 'systemInstruction' field
let gemini = GeminiAdapter::new(GeminiConfig::from_env("gemini-1.5-pro"))?;

// Same API works across all providers
let request = InferenceRequest::new(messages)?
    .with_system_prompt("You are a helpful assistant");
```

### Context Window Management (Optional)

For long conversations, enable automatic context management:

```rust
use mxp_agents::prompts::ContextWindowConfig;
use mxp_agents::adapters::ollama::{OllamaAdapter, OllamaConfig};

let adapter = OllamaAdapter::new(OllamaConfig::new("gemma2:2b"))?
    .with_context_config(ContextWindowConfig {
        max_tokens: 4096,
        recent_window_size: 10,
        ..Default::default()
    });

// SDK automatically manages conversation history within token budget
```

### Documentation Map

- `docs/overview.md` — architectural overview and design principles
- `docs/architecture.md` — crate layout, component contracts, roadmap
- `docs/features.md` — complete feature set and facade feature flags
- `docs/usage.md` — end-to-end setup guide for building agents
- `docs/enterprise.md` — production hardening guide with resilience, observability, and security
- `docs/errors.md` — error surfaces and troubleshooting tips

### Examples

- `examples/basic-agent` — simple agent with Ollama adapter and policy enforcement
- `examples/enterprise-agent` — production-grade agent demonstrating resilience, metrics, health checks, and graceful shutdown

### Getting Started

1. **Development**: Start with `examples/basic-agent` to understand core concepts
2. **Production**: Review `docs/enterprise.md` and `examples/enterprise-agent` for hardening patterns
3. **Integration**: Wire MXP endpoints for discovery and message handling
4. **Deployment**: Use health checks and metrics for Kubernetes integration


### Performance & Reliability

- **Sub-microsecond message encoding/decoding** via MXP protocol
- **Lock-free data structures** for high-concurrency scenarios
- **Bounded memory usage** with configurable limits
- **Automatic recovery** from transient failures
- **Graceful degradation** under load with rate limiting
- **Comprehensive testing** with property-based tests for correctness

### Security

- **Secrets management** with redacted Debug output
- **Rate limiting** to prevent resource exhaustion
- **Input validation** with configurable constraints
- **Audit events** for compliance and governance
- **Capability-based access control** for tools
- **Policy enforcement** with allow/deny/escalate decisions

### Observability

- **Prometheus metrics** for monitoring and alerting
- **Health checks** for Kubernetes integration
- **Distributed tracing** with OpenTelemetry support
- **Structured logging** with correlation IDs
- **Circuit breaker state tracking** for failure visibility
- **Request latency histograms** for performance analysis

## Requirements

- **Rust**: 1.85 or later (MSRV)
- **Tokio**: Async runtime (included via dependencies)
- **Optional**: Ollama, OpenAI, Anthropic, or Gemini API keys for LLM adapters

## Troubleshooting

### Circuit Breaker Opens Frequently

If the circuit breaker is opening too often:
- Increase `failure_threshold` in `CircuitBreakerConfig`
- Check provider status and connectivity
- Review timeout settings

### High Memory Usage

If memory usage is growing:
- Enable metrics cardinality limiting
- Check configuration hot reload is working
- Review rate limiter cleanup

### Slow Inference

If inference is slower than expected:
- Check `request_latency_seconds` metrics
- Verify provider API status
- Review retry and timeout configuration

See [docs/enterprise.md](docs/enterprise.md) for comprehensive troubleshooting guide.

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

### Development

```bash
# Build all crates
cargo build --all-features

# Run tests
cargo test --all-features

# Run linting
cargo clippy --all-targets --all-features -- -D warnings

# Format code
cargo fmt --check
```

## Community

- **GitHub Issues**: [Report bugs or request features](https://github.com/yafatek/mxpnexus/issues)
- **GitHub Discussions**: [Ask questions and discuss ideas](https://github.com/yafatek/mxpnexus/discussions)
- **Documentation**: [Full API docs](https://docs.rs/mxp-agents)

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

## Acknowledgments

Built with [Rust](https://www.rust-lang.org), [Tokio](https://tokio.rs), and the MXP protocol specification.