ethl 0.1.23

Tools for capturing, processing, archiving, and replaying Ethereum events
Documentation
# ethl

**WIP**

`ethl` is a powerful Ethereum log event ETL (Extract, Transform, Load) tool designed for efficient and reliable log processing. It provides:

- RPC utilities for streaming logs with support for multiple providers, fallback mechanisms, and retries.
- Tools to archive events into an opinionated Arrow + Parquet format for optimized storage and querying.
- Features enabling fast replay of specific events and continuous indexing for real-time use cases.

This tool was built to address the need for high-performance event processing and storage in Ethereum-based applications.

### Getting Started

To get started with `ethl`, add it to your `Cargo.toml` via `cargo install ethl`:

```toml
[dependencies]
ethl = "0.1"
```

Then, include it in your project:

```rust
use ethl;
```

For detailed examples and usage, check the [documentation](https://docs.rs/ethl).

### Development nodes (anvil, hardhat, ganache)

`ethl` treats RPC provider responses as authoritative. Local development nodes
expose two failure modes that can silently corrupt an event cache:

1. **Silent empty responses past tip.** Some nodes return `{"result": []}` for
   `eth_getLogs` on block ranges past their actual tip rather than erroring.
   `ethl` guards against this by calling `eth_blockNumber` whenever a batch
   returns no logs and checking whether `to_block` exceeds the reported tip.
   When the tip is behind, `ethl` waits — re-polling the tip on a bounded
   backoff and re-fetching the range (without advancing the cursor) once the
   tip catches up. `ethl` terminates with `RpcError::ProviderStalled` in two
   cases: a flat tip after `tip_wait_max_stall_cycles` consecutive cycles
   (default 5 × 60 s ≈ 5 min), or a tip that is advancing but fails to reach
   `to_block` within `tip_wait_max_total_secs` (default 1800 s / 30 min) —
   bounding providers that crawl but never converge. A fast catch-up within
   the budget rides through normally. The overhead is one extra RPC per empty
   batch; batches that return logs are unaffected.

2. **Synthetic blocks from local mining.** Dev nodes mine new blocks in
   response to transactions you send, including simulations. Those blocks
   share a number with real chain blocks but contain only your local
   transactions — they are well-formed blocks as far as the API is concerned,
   so `ethl` cannot distinguish them from fork-proxied real blocks. If you
   index against a node that has been used for writes, **treat the cache as
   poisoned**: roll `block_height` back to a value from before the node was
   first used and rebuild from a real RPC.