zeph 0.18.4

Lightweight AI agent with hybrid inference, skills-first architecture, and multi-channel I/O
# Spec: RuntimeLayer Middleware

> **Status**: Implemented; updated v0.18.1
> **Crate**: `zeph-core`

## 1. Overview

### Problem Statement

Cross-cutting behavior (logging, metrics, rate limiting, audit, token tracking)
was duplicated across the agent loop, tool executor, and channel adapters.
Each subsystem implemented its own version of the same pre/post hooks with no
shared contract, making it hard to add new behaviors consistently.

### Goal

Introduce a `RuntimeLayer` trait that defines `before_chat`, `after_chat`,
`before_tool`, and `after_tool` hooks. The agent loop calls these hooks
unconditionally at the appropriate points. New cross-cutting concerns are added
as `RuntimeLayer` implementations without touching the agent loop.

### Out of Scope

- Channel-level middleware (handled by `AnyChannel` dispatch)
- LLM provider interceptors (handled at `AnyProvider` level)
- Async streaming interception (hooks fire on full message, not per-chunk)

---

## 2. Trait Definition

```rust
// crates/zeph-core/src/runtime_layer.rs

/// Cross-cutting hooks invoked by the agent loop around LLM calls and tool execution.
///
/// All methods have default no-op implementations so that only relevant hooks
/// need to be overridden. The trait is object-safe; layers are stored as
/// `Arc<dyn RuntimeLayer>` in the agent.
pub trait RuntimeLayer: Send + Sync + 'static {
    /// Called immediately before the LLM `chat` / `chat_with_tools` call.
    ///
    /// `ctx.conversation_id`: the current session ID.
    /// `ctx.turn_number`: monotonically increasing turn counter, starting at 1.
    fn before_chat(&self, _ctx: &LayerContext) {}

    /// Called immediately after the LLM response is received (success or error).
    ///
    /// `result`: `Ok(())` on success, `Err(description)` on LLM error.
    fn after_chat(&self, _ctx: &LayerContext, _result: Result<(), &str>) {}

    /// Called immediately before a tool is executed.
    ///
    /// `tool_name`: the tool being invoked.
    fn before_tool(&self, _ctx: &LayerContext, _tool_name: &str) {}

    /// Called immediately after a tool execution completes.
    ///
    /// `tool_name`: the tool that was invoked.
    /// `result`: `Ok(())` on success, `Err(description)` on tool error.
    fn after_tool(&self, _ctx: &LayerContext, _tool_name: &str, _result: Result<(), &str>) {}
}
```

### NoopLayer

`NoopLayer` is the zero-cost default implementation:

```rust
pub struct NoopLayer;

impl RuntimeLayer for NoopLayer {}
```

Used when no layers are configured. The agent holds `Arc<dyn RuntimeLayer>`; with
`NoopLayer`, all calls are inlined and optimized away by the compiler.

### LayerContext

```rust
/// Contextual data passed to every `RuntimeLayer` hook.
#[derive(Debug, Clone)]
pub struct LayerContext {
    /// The current conversation/session identifier.
    pub conversation_id: String,

    /// Monotonically increasing turn counter for this session. Starts at 1.
    pub turn_number: u64,
}
```

`LayerContext` is constructed once per agent session and updated in-place at
the start of each turn (incrementing `turn_number`). The agent does not clone
it per-call; hooks receive a shared reference.

---

## 3. Agent Integration

The agent holds `layer: Arc<dyn RuntimeLayer>`. At the appropriate points:

```rust
// Before LLM call:
self.layer.before_chat(&self.layer_ctx);

// After LLM call:
self.layer.after_chat(&self.layer_ctx, result_as_str_ref);

// Before tool execution:
self.layer.before_tool(&self.layer_ctx, tool_name);

// After tool execution:
self.layer.after_tool(&self.layer_ctx, tool_name, result_as_str_ref);
```

`turn_number` is incremented at the start of `process_user_message()`, before
any hooks fire.

---

## 4. Config

No dedicated config section. The `RuntimeLayer` implementation is selected at
bootstrap based on enabled features. Multiple layers can be composed via a
`CompositeLayer` that delegates to a `Vec<Arc<dyn RuntimeLayer>>`.

```toml
# No user-facing config.
# Future: [agent.layers] enabled = ["audit", "metrics"]
```

---

## 5. Key Invariants

- All hook methods have default no-op implementations — `RuntimeLayer` is opt-in behavior
- `LayerContext.turn_number` is incremented exactly once per user turn — never skip or double-increment
- Hook failures are non-fatal — panics in hook implementations are caught via `catch_unwind` and logged; the agent turn continues
- `before_chat` fires before the LLM call, `after_chat` fires after (not before result is injected into context)
- `before_tool` fires before executor dispatch, `after_tool` fires after the `Option<ToolOutput>` is resolved
- NEVER perform blocking I/O in layer hooks — all hooks are called synchronously in the async agent loop
- `Arc<dyn RuntimeLayer>` is the only storage form — never `Box<dyn RuntimeLayer>` in the agent struct

---

## 6. Agent Boundaries

### Always (without asking)
- Implement `Send + Sync + 'static` bounds on all `RuntimeLayer` implementations
- Default no-op for all unimplemented hooks
- Increment `turn_number` before any hook fires in a turn

### Ask First
- Adding new hook methods to `RuntimeLayer` (breaks all existing implementations unless defaulted)
- Adding fields to `LayerContext` (affects all hook signatures)

### Never
- Block the agent loop in a hook implementation
- Store mutable state without interior mutability (`Arc<Mutex<T>>` is acceptable)
- Call `before_chat` more than once per turn

---

## 7. References

- `crates/zeph-core/src/runtime_layer.rs` — trait definition
- `crates/zeph-core/src/agent/mod.rs` — hook call sites
- `001-system-invariants/spec.md` — Agent Loop Contract (§2)