ethl 0.1.23

Tools for capturing, processing, archiving, and replaying Ethereum events
Documentation

ethl

WIP

ethl is a powerful Ethereum log event ETL (Extract, Transform, Load) tool designed for efficient and reliable log processing. It provides:

  • RPC utilities for streaming logs with support for multiple providers, fallback mechanisms, and retries.
  • Tools to archive events into an opinionated Arrow + Parquet format for optimized storage and querying.
  • Features enabling fast replay of specific events and continuous indexing for real-time use cases.

This tool was built to address the need for high-performance event processing and storage in Ethereum-based applications.

Getting Started

To get started with ethl, add it to your Cargo.toml via cargo install ethl:

[dependencies]
ethl = "0.1"

Then, include it in your project:

use ethl;

For detailed examples and usage, check the documentation.

Development nodes (anvil, hardhat, ganache)

ethl treats RPC provider responses as authoritative. Local development nodes expose two failure modes that can silently corrupt an event cache:

  1. Silent empty responses past tip. Some nodes return {"result": []} for eth_getLogs on block ranges past their actual tip rather than erroring. ethl guards against this by calling eth_blockNumber whenever a batch returns no logs and checking whether to_block exceeds the reported tip. When the tip is behind, ethl waits — re-polling the tip on a bounded backoff and re-fetching the range (without advancing the cursor) once the tip catches up. ethl terminates with RpcError::ProviderStalled in two cases: a flat tip after tip_wait_max_stall_cycles consecutive cycles (default 5 × 60 s ≈ 5 min), or a tip that is advancing but fails to reach to_block within tip_wait_max_total_secs (default 1800 s / 30 min) — bounding providers that crawl but never converge. A fast catch-up within the budget rides through normally. The overhead is one extra RPC per empty batch; batches that return logs are unaffected.

  2. Synthetic blocks from local mining. Dev nodes mine new blocks in response to transactions you send, including simulations. Those blocks share a number with real chain blocks but contain only your local transactions — they are well-formed blocks as far as the API is concerned, so ethl cannot distinguish them from fork-proxied real blocks. If you index against a node that has been used for writes, treat the cache as poisoned: roll block_height back to a value from before the node was first used and rebuild from a real RPC.