# STORAGE MODULE
Persistent storage backends for TB-scale result persistence using columnar formats.
## STRUCTURE
```
storage/
├── mod.rs # StorageBackend trait, record types, errors
├── schema.rs # Arrow schema definition + records_to_batch conversion
├── parquet_backend.rs # ParquetBackend with auto chunk rotation
└── query.rs # DuckDB query executor (feature: storage-query)
```
## WHERE TO LOOK
| Add new backend | Implement `StorageBackend` trait in `{name}.rs` |
| Modify record structure | `mod.rs` - record structs |
| Add error variant | `mod.rs` - `StorageError` enum |
| Schema changes | `schema.rs` - Arrow schema + field constants |
| Add new column | `schema.rs` - update `result_schema()` + `records_to_batch()` |
| Chunk rotation logic | `parquet_backend.rs` - `should_rotate()`, `rotate_chunk()` |
| Query stored results | `query.rs` - `QueryExecutor` (requires `storage-query` feature) |
## CONVENTIONS
- **Feature-gated**: Entire module behind `#[cfg(feature = "storage")]`
- **Chain-agnostic**: Record types support any blockchain (Bitcoin, Ethereum, Solana)
- **Zero-copy**: Record structs use `&'a str` and `&'a [T]` for efficiency
- **Flat schema**: Variable-length fields mapped to fixed columns for SQL-friendly querying
- **Auto-chunking**: ParquetBackend rotates files at configurable thresholds
## STORAGE BACKEND TRAIT
```rust
pub trait StorageBackend: Send + Sync {
fn write_batch(&mut self, records: &[ResultRecord<'_>]) -> Result<()>;
fn flush(&mut self) -> Result<Vec<PathBuf>>;
fn schema(&self) -> &Schema;
}
```
## PARQUET BACKEND
```rust
ParquetBackend::new("results/", "milksad")
.with_compression(Compression::ZSTD(Default::default()))
.with_chunk_records(1_000_000)
.with_chunk_bytes(100 * 1024 * 1024)
```
| `with_chunk_records(n)` | 1M | Rotate after N records |
| `with_chunk_bytes(n)` | 100MB | Rotate after N bytes |
| `without_chunking()` | - | Disable auto-rotation |
**Hive-style partitioning**: `transform={name}/date={YYYY-MM-DD}/chunk_{NNNN}.parquet`
## QUERY EXECUTOR (feature: storage-query)
```rust
let executor = QueryExecutor::new("./results")?;
// Returns Arrow RecordBatches (zero-copy)
let result = executor.query_arrow("SELECT * FROM results WHERE transform = 'milksad'")?;
for batch in result.batches() { ... }
// Returns Vec<Row> (simple iteration)
let rows = executor.query("SELECT transform, COUNT(*) FROM results GROUP BY transform")?;
for row in rows {
println!("{}: {}", row.get_string("transform")?, row.get_i64("count")?);
}
// Scalar queries
let count: Option<i64> = executor.query_scalar_i64("SELECT COUNT(*) FROM results")?;
```
| `query_arrow(sql)` | `QueryResult` (RecordBatches) | High performance, Arrow ecosystem |
| `query(sql)` | `Vec<Row>` | Simple iteration |
| `query_scalar_i64(sql)` | `Option<i64>` | COUNT, SUM, etc. |
| `query_scalar_string(sql)` | `Option<String>` | Single string value |
| `schema()` | `Option<Schema>` | Introspect result schema |
| `discovered_files()` | `Vec<String>` | List Parquet files |
| `refresh()` | `()` | Recreate view after new writes |
**Auto-discovery**: Creates `results` view from `{path}/**/*.parquet` with Hive partitioning enabled.
## ARROW SCHEMA (19 columns)
Core columns: `source`, `transform`, `chain`, `timestamp`, `matched_target`
Key columns: `private_key_raw`, `private_key_hex`, `private_key_decimal`, `private_key_binary`, `private_key_bit_length`, `private_key_hamming_weight`, `private_key_leading_zeros`
Derived columns: `pubkey_compressed`, `pubkey_uncompressed`, `address_p2pkh_*`, `address_p2wpkh`, `wif_*`
## DEPENDENCIES
```toml
parquet = { version = "54", optional = true, features = ["arrow", "zstd"] }
arrow = { version = "54", optional = true }
duckdb = { version = "1.4", optional = true, features = ["bundled"] }
```
Enable storage: `cargo build --features storage`
Enable query: `cargo build --features storage-query`