ip4sum 0.1.0

Highly optimized IPv4 checksum calculation, no-std compatible
Documentation
# ip4sum -- Optimized IPv4 Internet Checksum

[![Crates.io](https://img.shields.io/crates/v/ip4sum.svg?style=for-the-badge)](https://crates.io/crates/ip4sum)
[![Documentation](https://img.shields.io/docsrs/ip4sum?style=for-the-badge)](https://docs.rs/ip4sum)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg?style=for-the-badge)](LICENSE)

**ip4sum** is a highly optimized implementation of the Internet checksum defined in [RFC 1071](https://tools.ietf.org/html/rfc1071) and updated in [RFC 1141](https://tools.ietf.org/html/rfc1141) and [RFC 1624](https://tools.ietf.org/html/rfc1624), used in IPv4, TCP, UDP, and ICMP headers.

A portable C99 implementation is also provided in the `c/` directory.

## Key Features

- **Fast**: Up to 5x faster than `internet-checksum` (Fuchsia/Google) on typical packet sizes
- **No-std compatible**: Zero dependencies, works in embedded and bare-metal environments
- **Zero-allocation**: All computation is done in-place on the stack
- **Incremental API**: Supports multi-part checksum computation for scattered packet data
- **Portable C version**: C99-compatible implementation with zero dependencies

## Installation

```
cargo add ip4sum
```

## Quick Start

### One-shot Computation

```rust
let data = [0x45, 0x00, 0x00, 0x30, 0x00, 0x00, 0x40, 0x00,
            0x40, 0x01, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x01,
            0x0a, 0x00, 0x00, 0x02];

let csum = ip4sum::checksum(&data);
```

### Incremental Computation

```rust
use ip4sum::Checksum;

let mut hasher = Checksum::new();
hasher.update(&data[..10]);
hasher.update(&[0x00, 0x00]); // checksum field placeholder
hasher.update(&data[12..]);
let csum = hasher.finalize();
```

### C Version

```c
#include "checksum.h"

/* one-shot */
uint16_t csum = ip4sum_checksum_oneshot(data, len);

/* incremental */
ip4sum_checksum c = ip4sum_checksum_new();
ip4sum_checksum_update(&c, header, 20);
ip4sum_checksum_update(&c, payload, 1480);
uint16_t csum = ip4sum_checksum_finalize(c);
```

## API Reference

### `checksum(data: &[u8]) -> u16`

Compute the Internet checksum of a byte slice in one shot. Returns the 16-bit one's-complement checksum in network byte order. The checksum field in the input should be set to zero before calling.

### `Checksum`

An incremental checksum calculator.

| Method                           | Description                                                  |
| -------------------------------- | ------------------------------------------------------------ |
| `Checksum::new()`                | Create a new calculator with accumulator initialized to zero |
| `update(&mut self, data: &[u8])` | Feed a slice of data into the running checksum               |
| `finalize(self) -> u16`          | Consume the calculator and return the 16-bit checksum        |
| `reset(&mut self)`               | Reset the calculator to its initial state                    |

## Performance

Benchmarks run on Linux x86_64 with Rust 1.94 (`-C opt-level=3 -C lto=fat -C codegen-units=1`).

### One-shot Checksum (`checksum`)

| Bytes |  ip4sum | internet-checksum |    Ratio |
| ----: | ------: | ----------------: | -------: |
|    20 |  7.2 ns |           33.6 ns | **4.7x** |
|    40 |  6.8 ns |           37.5 ns | **5.5x** |
|    64 |  6.0 ns |            9.7 ns | **1.6x** |
|   128 | 10.3 ns |           14.6 ns | **1.4x** |
|   256 | 13.3 ns |           25.4 ns | **1.9x** |
|   512 | 10.6 ns |           79.0 ns | **7.5x** |
|  1000 | 25.6 ns |           84.8 ns | **3.3x** |
|  1500 | 24.3 ns |          126.2 ns | **5.2x** |

### Incremental Checksum (`Checksum` struct)

| Bytes |  ip4sum | internet-checksum |    Ratio |
| ----: | ------: | ----------------: | -------: |
|    20 |  4.3 ns |           20.0 ns | **4.7x** |
|    64 |  6.8 ns |           10.1 ns | **1.5x** |
|   256 |  6.4 ns |           25.7 ns | **4.0x** |
|  1500 | 23.9 ns |          145.2 ns | **6.1x** |

### Multi-feed Incremental (20B header + 1480B payload)

|  ip4sum | internet-checksum |    Ratio |
| ------: | ----------------: | -------: |
| 25.3 ns |          140.0 ns | **5.5x** |

### Rust vs C Comparison

The Rust and C implementations use the same algorithm: a 64-bit wide accumulator with 32-bit reads in native byte order, deferring the carry fold and endian swap to a single final step.

The performance difference between the two comes down to how each compiler optimizes the same logical pattern. Rust's LLVM backend and C compilers (GCC, Clang, MSVC) apply different register allocation, loop vectorization, and instruction scheduling strategies to identical source-level logic. In practice, the Rust version compiled with `-C lto=fat -C codegen-units=1` tends to produce tighter inner loops because the whole-program optimization can inline and specialize more aggressively, while the C version is no slouch either and benefits from decades of loop optimization in mature C compilers.

Run benchmarks on your machine:

```
# Rust benchmarks
cargo bench

# C benchmarks
cd c && gcc -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum
```

## Why It's Fast

The key insight is simplicity. The accumulator is a 64-bit integer, and we add 32-bit words to it using plain `wrapping_add` with zero carry tracking. Since a u64 accumulator overflows after ~4 billion additions (~16 GB of data), no realistic packet comes close to the limit. Carry folding is deferred to a single cheap step at the end.

In contrast, `internet-checksum` (Fuchsia/Google) introduces per-addition overhead:

- Manual carry tracking via `overflowing_add` + boolean carry propagation
- `Option<u8>` trailing byte field with branching on every `add_bytes` call
- Size-based dispatch to a separate `add_bytes_small` path using `checked_add`
- Macro-based loop unrolling with `try_into().unwrap()` in the expansion
- Multi-function normalize chain (`normalize` -> `normalize_64` -> `adc_u32` -> `adc_u16`)

None of these "optimizations" help for any input under 16 GB. The compiler generates tighter code from a simple `wrapping_add` loop than from manual carry management.

## Testing

```
# Rust tests
cargo test

# C tests
cd c && gcc -Wall -Wextra -Werror -pedantic -std=c99 -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum
```

## Contributing

Contributions are welcome! Please:

1. Run `cargo +nightly fmt` and `cargo clippy` before submitting
2. Add tests for new functionality
3. Update documentation as needed

## License

Licensed under the [MIT License](LICENSE).

---

**Author**: Khashayar Fereidani
**Repository**: [github.com/fereidani/ip4sum](https://github.com/fereidani/ip4sum)