phi-core 0.10.0

Simple, effective agent loop with tool execution and event streaming
Documentation
<!-- Last verified: 2026-04-05 by Claude Code -->
# Retry with Backoff

When an LLM provider returns a transient error — rate limit (HTTP 429) or network failure — phi-core automatically retries with exponential backoff and jitter. No configuration required; it works out of the box.

## How it works

```
Request → Error? → Retryable? → Wait (backoff + jitter) → Retry → ...
                       ↓ No
                  Fail immediately
```

1. The agent loop calls the provider
2. If the provider returns a retryable error:
   - If a `retry-after` delay was provided (rate limits), use that
   - Otherwise, calculate delay: `initial_delay × multiplier^(attempt-1)` with ±20% jitter
   - Wait, then retry
3. After `max_retries` attempts, the error propagates normally

## What gets retried

| Error Type | Retried? | Why |
|------------|----------|-----|
| `RateLimited` (429) | ✅ Yes | Temporary — provider will accept requests again soon |
| `Network` | ✅ Yes | Transient — connection resets, timeouts, DNS failures |
| `Auth` (401/403) | ❌ No | Permanent — wrong API key won't fix itself |
| `Api` (400, etc.) | ❌ No | Permanent — bad request won't change on retry |
| `Cancelled` | ❌ No | User-initiated — respect the cancellation |

## Default configuration

```rust
RetryConfig {
    max_retries: 3,          // Up to 3 retry attempts
    initial_delay_ms: 1000,  // 1 second before first retry
    backoff_multiplier: 2.0, // Double the delay each attempt
    max_delay_ms: 30_000,    // Cap at 30 seconds
}
```

With defaults, the retry delays are approximately:
- Attempt 1: ~1s
- Attempt 2: ~2s
- Attempt 3: ~4s

(±20% jitter to avoid thundering herd when multiple agents hit the same provider)

## Configuration

### Using the Agent builder

```rust
use phi_core::{BasicAgent, RetryConfig};
use phi_core::provider::ModelConfig;

let api_key = std::env::var("ANTHROPIC_API_KEY").unwrap();

// Default — 3 retries, exponential backoff (recommended)
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
));

// Custom — more retries, longer initial delay
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_retry_config(RetryConfig {
    max_retries: 5,
    initial_delay_ms: 2000,
    backoff_multiplier: 2.0,
    max_delay_ms: 60_000,
});

// Disable retries entirely
let agent = BasicAgent::new(ModelConfig::anthropic(
    "claude-sonnet-4-20250514",
    "Claude Sonnet 4",
    &api_key,
))
.with_retry_config(RetryConfig::none());
```

### Using AgentLoopConfig directly

```rust
use phi_core::agent_loop::AgentLoopConfig;
use phi_core::RetryConfig;

let config = AgentLoopConfig {
    // ...other fields...
    retry_config: RetryConfig {
        max_retries: 3,
        initial_delay_ms: 1000,
        backoff_multiplier: 2.0,
        max_delay_ms: 30_000,
    },
    ..Default::default()
};
```

## Rate limit headers

When a provider returns `ProviderError::RateLimited { retry_after_ms: Some(5000) }`, phi-core uses that exact delay instead of the calculated backoff. This respects the provider's guidance — if Anthropic says "retry after 5 seconds", we wait 5 seconds, not our own estimate.

If no `retry_after_ms` is provided, the exponential backoff kicks in.

## Observability

Retry attempts are logged via `tracing` at the `WARN` level:

```
WARN Provider error (attempt 1/3), retrying in 1.1s: Rate limited, retry after 1000ms
WARN Provider error (attempt 2/3), retrying in 2.3s: Rate limited, retry after 2000ms
```

Subscribe to tracing events in your application to surface these in your UI:

```rust
use tracing_subscriber;

// Simple stderr logging
tracing_subscriber::fmt::init();

// Or filter to just retries
tracing_subscriber::fmt()
    .with_env_filter("phi_core::provider::retry=warn")
    .init();
```

## Design notes

- **Retry lives in the agent loop**, not inside individual providers. One config controls all retry behavior.
- **Jitter** prevents thundering herd: when many agents hit a rate limit simultaneously, jitter spreads their retries so they don't all retry at the same instant.
- **Cancellation is respected**: if the user cancels while waiting for a retry, the loop exits immediately.
- **No retry on API errors**: a malformed request will fail the same way every time. Retrying wastes time and tokens.