# Retry Logic + Error Visibility Plan
## Context
HTTP requests to BetterStack can fail due to transient issues (5xx, timeouts, network blips). The library currently swallows these errors silently — `push_log` prints them and returns `None`, and Logger methods return `()`. Consumers have no way to know if a log was actually delivered. This PR addresses both problems: configurable retry logic and optional error visibility.
## Requirements
- Expose retry configuration options to consumers (not hardcoded defaults)
- Retry only on transient failures (5xx status codes, timeouts, connection errors)
- Do NOT retry on client errors (4xx) or serialization failures
- Support exponential backoff with configurable parameters
- Give consumers the option to receive `Result` from log methods instead of silent `()`
## Design Questions (to decide before implementation)
### 1. Configuration surface — How should consumers configure retries?
- Option A: `RetryConfig` struct passed to `Logger::new()`
- Option B: Builder pattern on Logger (`Logger::builder().retries(3).backoff(...)`)
- Option C: Fields on `EnvConfig` (env-var driven)
### 2. Default behavior — Should retries be on or off by default?
- On with sensible defaults (3 retries, 1s base backoff)
- Off, opt-in only
### 3. Backoff parameters — What should be configurable?
- `max_retries: u32`
- `base_delay: Duration`
- `max_delay: Duration` (cap)
- Jitter (on/off)
### 4. Implementation — Manual or library?
- Manual with `tokio::time::sleep` — zero new deps, full control
- `reqwest-middleware` + `reqwest-retry` — battle-tested but adds deps and changes how `reqwest::Client` is constructed
### 5. Error visibility — How should consumers opt into seeing errors?
Currently Logger methods return `()`. Options:
- **Option A: Dual API** — keep existing `info/warn/error` returning `()` (backward compatible), add `try_info/try_warn/try_error` returning `Result<(), LogtailError>`
- **Option B: Change return type** — all methods return `Result<(), LogtailError>`, consumers who don't care use `.ok()` or ignore. **Breaking change.**
- **Option C: Callback/hook** — `Logger::on_error(|e| ...)` registers an error handler at construction time
## Proposed API (draft)
```rust
// --- Retry config ---
pub struct RetryConfig {
pub max_retries: u32,
pub base_delay: Duration,
pub max_delay: Duration,
}
impl Default for RetryConfig {
fn default() -> Self {
Self {
max_retries: 3,
base_delay: Duration::from_secs(1),
max_delay: Duration::from_secs(30),
}
}
}
// --- Error visibility (Option A: dual API) ---
// Silent (existing behavior, backward compatible):
logger.info(log).await;
// With result:
if let Err(e) = logger.try_info(log).await {
eprintln!("Failed to send log: {}", e);
}
// --- Construction ---
let logger = Logger::with_retry(RetryConfig::default());
// or:
let logger = Logger::with_retry(RetryConfig {
max_retries: 5,
base_delay: Duration::from_millis(500),
max_delay: Duration::from_secs(10),
});
```
## Files likely affected
- `src/http_client/mod.rs` — `RetryConfig` definition
- `src/http_client/base_client.rs` — retry loop wrapping `reqwest` calls
- `src/http_client/service.rs` — `push_log` returns `Result` instead of `Option`
- `src/lib.rs` — new Logger constructor, `try_info/try_warn/try_error` methods
- `src/struct/env_config.rs` — possibly, if env-var driven config is chosen
## Testing approach
- Unit tests with `MockHttpClient` returning errors N times then succeeding
- Verify retry count matches config
- Verify no retry on 4xx / serialization errors
- Verify backoff delays (mock clock or just verify call count)
- Verify `try_info` returns `Err` on failure and `Ok` on success
- Verify `info` still returns `()` and doesn't panic on failure