hash-iter 2.0.1

Iterator producing sequence of hash values for a given input (using double hashing technique)
Documentation
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

`hash-iter` is a Rust crate that implements the **enhanced double hashing** technique based on the [Bloom Filters in Probabilistic Verification](https://www.khoury.northeastern.edu/~pete/pub/bloom-filters-verification.pdf) paper (Dillinger and Manolios, 2004). The crate provides an iterator that generates a sequence of hash values from a single input key, useful for hash tables with open addressing, bloom filters, and consistent hashing implementations.

## Building and Testing

### Build Commands
```bash
cargo build              # Build the project
cargo build --release    # Build optimized release version
```

### Testing Commands
```bash
cargo test               # Run all tests (unit, integration, doctests)
cargo test --lib         # Run unit tests only (in src/lib.rs)
cargo test --test double_hash  # Run integration tests
cargo test --doc         # Run documentation tests
cargo test <test_name>   # Run a specific test
```

### Other Common Commands
```bash
cargo doc --open         # Generate and open documentation
cargo clippy             # Run linter
cargo fmt                # Format code
cargo publish --dry-run  # Test package build for crates.io
```

## Architecture

### Core Components

**Traits:**
- `HashIterHasher<T>`: Main trait for hashers that produce multiple hash values via `hash_iter()` method
- `BuildHashIterHasher<T>`: Builder trait for constructing hash iterator hashers

**Main Types:**
- `DoubleHashBuilder<T>`: Builder for configuring hasher with custom seeds (`seed1`, `seed2`) and modulus `n`
- `DoubleHashHasher<T, H1, H2>`: The main hasher struct that holds two hash builders and implements enhanced double hashing
- `Hashes<T>`: The iterator itself that yields hash values using forward differencing optimization

**Supported Numeric Types:** `u32`, `u64`, `u128` only

**Why only these types?**
The crate intentionally restricts supported types to ensure correctness and platform-independence:
- `u8` and `u16` are too small for any real-world hash table or bloom filter (max 256 or 65K values)
- `usize` creates platform-dependent behavior (32-bit vs 64-bit systems produce different hash sequences)
- `u32` (4.3 billion values) is the practical minimum for real applications
- `u64` (18 quintillion values) is the ecosystem standard (matches `std::hash::Hasher` output)
- `u128` provides future-proofing and cryptographic-quality hash spaces
- This restriction allows the library to use a **macro-based implementation** that eliminates all fallible type conversions

**Error Handling:** None! All methods return values directly, not `Result`. Count overflow is impossible with u32+ types.

**Thread Safety:** All core types (`DoubleHashBuilder<T>`, `DoubleHashHasher<T, H1, H2>`, and `Hashes<T>`) are `Copy`, making them automatically `Send + Sync`. Hash builders must also be `Send + Sync` for multi-threaded use.

### Macro-Based Implementation

The library uses a **declarative macro** (`impl_hash_iter_for_type!`) to generate concrete implementations for each supported numeric type. This approach:

1. **Eliminates all runtime error handling**: No fallible type conversions
2. **Provides explicit type support**: Users specify the output type (e.g., `DoubleHashHasher::<u64>::new()`)
3. **Maintains API flexibility**: Still supports custom hash builders via generic `H1` and `H2` parameters
4. **Zero dependencies beyond xxhash**: Removed `num-traits` dependency

**Key Implementation Details:**
- Hash inputs from `BuildHasher::hash_one()` are always `u64` (mandated by Rust's std library)
- Output type `T` can be u32, u64, or u128
- u64 hash values are cast to target type using `as T` (truncation for u32)
- Default seeds (12345, 67890) are cast with `.wrapping_add(0)` to handle truncation silently
- Count parameter remains `usize` for ergonomics (natural Rust idiom)

### Enhanced Double Hashing Algorithm

The core algorithm implements the mathematical formula:
```
h(i) = h1(k) + i * h2(k) + (i^3-i)/6 (mod n)
```

The `Hashes` iterator optimizes this using **forward differencing** to avoid computing `i^3` on each iteration. Key implementation details:

1. First iteration returns `hash1 % n`
2. Subsequent iterations use: `hash1 = (hash1 + hash2) % n` and `hash2 = (hash2 + cnt) % n`
3. Custom modular addition `add_mod()` prevents overflow when values approach `T::MAX`

**Critical:** The modular arithmetic implementation must handle large hash values correctly. The `add_mod` helper in `Hashes::next()` (src/lib.rs:183-196) ensures correct behavior when operands are close to the numeric type's maximum value. See the `modular_arithmetic_overflow_regression` test (tests/double_hash.rs:123-173) for edge cases.

### Default Configuration

- Default hash function: XXH3 (via `xxhash-rust` crate)
- Default seeds: `seed1 = 12345`, `seed2 = 67890` (truncated for u32)
- Default modulus: `T::MAX` for the chosen numeric type
- Iterator traits: `Hashes` implements `Iterator`, `ExactSizeIterator`, `Clone`, and `Copy`

### Type Truncation Behavior

**Important:** Hash inputs from `BuildHasher::hash_one()` are always `u64` (mandated by Rust's standard library). The output type `T` can be `u32`, `u64`, or `u128`.

When using `u32` as the output type:
- `u64` hash values are truncated via `as u32` cast (intentional design)
- Different numeric types produce different hash sequences due to truncation
- Seeds (12345, 67890) are cast with `.wrapping_add(0)` to handle truncation silently
- This behavior is documented and tested (see `tests/double_hash.rs:338-369`)

When using `u128` as the output type:
- `u64` hash values are zero-extended to `u128` (no truncation)
- Provides a larger hash space for cryptographic or large-scale applications

### Performance Characteristics

The implementation is designed for high-throughput applications:
- **O(1) per iteration** - Forward differencing eliminates the need to compute `i^3` on each iteration
- **Zero allocations** - All state is stack-allocated
- **Small memory footprint** - Iterator state is only 3-4 fields (hash1, hash2, n, cnt/k)
- **Cache-friendly** - Sequential access pattern with minimal state
- **No branches in hot path** - Modular arithmetic uses `add_mod` helper for predictable performance
- **Excellent for batch operations** - Generate thousands of hashes with minimal overhead

### Key Design Patterns

1. **Macro-Generated Implementations**: Single macro generates all type-specific implementations
2. **Type Flexibility**: Generic over hash builders (H1, H2: BuildHasher) but concrete for output types
3. **Builder Pattern**: `DoubleHashBuilder` provides fluent API for configuration
4. **Explicit Type Parameters**: Users must specify type parameters to disambiguate (e.g., `DoubleHashBuilder::<u64>::new()`)
5. **No Error Handling**: All methods return values directly, no `Result` wrappers needed

## Safety and Dependencies

- `#![forbid(unsafe_code)]` - no unsafe code allowed
- Dependencies: `xxhash-rust` (default hash function) only
- Removed dependency: `num-traits` (replaced by macro-based approach)
- No error types: Removed `src/error.rs` entirely

## Edge Cases and Panics

The library is designed to minimize panics, but some configurations can cause natural Rust panics:

**Division by Zero:**
- Setting `n = 0` in the builder will cause a panic during modular arithmetic (`% 0`)
- This is intentional - it's a programming error, not a recoverable condition
- Default `n = T::MAX` is always valid

**Count Parameter:**
- Count is `usize` for ergonomics (natural Rust idiom)
- Internally converted to output type `T` via `as T` cast
- For `u32` output with large `usize` count on 64-bit systems, truncation may occur (rare in practice)
- For practical applications, counts are well within u32 range

**Edge Cases That Work Correctly:**
- `k = 0` produces an empty iterator (correct behavior)
- `n = 1` produces all zeros (mathematically correct: all values mod 1 equal 0)
- Large hash values near `T::MAX` are handled correctly by `add_mod` helper
- Hash values are always in range `[0, n)` (guaranteed by modular arithmetic)

## Common Test Patterns

Tests use a reference implementation `hasn_fn()` to verify correctness. When adding tests:
- Compare iterator output against mathematical formula
- Test edge cases: large hash values near `T::MAX`, small modulus values
- Verify iterator properties: exact size, cloning, checkpointing
- Test all supported numeric types (u32, u64, u128)
- Test truncation behavior (different types produce different sequences)
- No need to test error cases - there are none!

## Important Notes for Developers

- Always specify type parameters explicitly (e.g., `DoubleHashHasher::<u64>::new()`)
- When using `with_hash_builders`, use turbofish syntax: `DoubleHashHasher::<T, _, _>::with_hash_builders(...)`
- Be aware that different numeric types produce different sequences due to truncation
- No `.unwrap()` calls needed anywhere - all methods are infallible
- Count parameter is `usize` for ergonomics, but internally converted to output type

## API Simplicity

All methods return values directly, no error handling needed:

```rust
// Clean API with no Result wrappers
let hasher = DoubleHashHasher::<u64>::new();
let hashes = hasher.hash_iter(&"key", 10).collect::<Vec<_>>();

// Custom configuration
let hasher = DoubleHashBuilder::<u64>::new()
    .with_seed1(12345)
    .with_n(1_000_000)
    .build_hash_iter_hasher();
let hashes: Vec<u64> = hasher.hash_iter(&"key", 10).collect();

// Using non-default types
let hasher_u32 = DoubleHashHasher::<u32>::new();
let hashes: Vec<u32> = hasher_u32.hash_iter(&"key", 10).collect();
```

This is possible because:
- u32/u64/u128 are large enough for any practical count value
- Hash builder construction never fails
- Type conversions use simple `as` casts