# ip4sum -- Optimized IPv4 Internet Checksum
[](https://crates.io/crates/ip4sum)
[](https://docs.rs/ip4sum)
[](LICENSE)
**ip4sum** is a highly optimized implementation of the Internet checksum defined in [RFC 1071](https://tools.ietf.org/html/rfc1071) and updated in [RFC 1141](https://tools.ietf.org/html/rfc1141) and [RFC 1624](https://tools.ietf.org/html/rfc1624), used in IPv4, TCP, UDP, and ICMP headers.
A portable C99 implementation is also provided in the `c/` directory.
## Key Features
- **Fast**: Up to 5x faster than `internet-checksum` (Fuchsia/Google) on typical packet sizes
- **No-std compatible**: Zero dependencies, works in embedded and bare-metal environments
- **Zero-allocation**: All computation is done in-place on the stack
- **Incremental API**: Supports multi-part checksum computation for scattered packet data
- **Portable C version**: C99-compatible implementation with zero dependencies
## Installation
```
cargo add ip4sum
```
## Quick Start
### One-shot Computation
```rust
let data = [0x45, 0x00, 0x00, 0x30, 0x00, 0x00, 0x40, 0x00,
0x40, 0x01, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x01,
0x0a, 0x00, 0x00, 0x02];
let csum = ip4sum::checksum(&data);
```
### Incremental Computation
```rust
use ip4sum::Checksum;
let mut hasher = Checksum::new();
hasher.update(&data[..10]);
hasher.update(&[0x00, 0x00]); // checksum field placeholder
hasher.update(&data[12..]);
let csum = hasher.finalize();
```
### C Version
```c
#include "checksum.h"
/* one-shot */
uint16_t csum = ip4sum_checksum_oneshot(data, len);
/* incremental */
ip4sum_checksum c = ip4sum_checksum_new();
ip4sum_checksum_update(&c, header, 20);
ip4sum_checksum_update(&c, payload, 1480);
uint16_t csum = ip4sum_checksum_finalize(c);
```
## API Reference
### `checksum(data: &[u8]) -> u16`
Compute the Internet checksum of a byte slice in one shot. Returns the 16-bit one's-complement checksum in network byte order. The checksum field in the input should be set to zero before calling.
### `Checksum`
An incremental checksum calculator.
| Method | Description |
| -------------------------------- | ------------------------------------------------------------ |
| `Checksum::new()` | Create a new calculator with accumulator initialized to zero |
| `update(&mut self, data: &[u8])` | Feed a slice of data into the running checksum |
| `finalize(self) -> u16` | Consume the calculator and return the 16-bit checksum |
| `reset(&mut self)` | Reset the calculator to its initial state |
## Performance
Benchmarks run on Linux x86_64 with Rust 1.94 (`-C opt-level=3 -C lto=fat -C codegen-units=1`).
### One-shot Checksum (`checksum`)
| Bytes | ip4sum | internet-checksum | Ratio |
| ----: | ------: | ----------------: | -------: |
| 20 | 7.2 ns | 33.6 ns | **4.7x** |
| 40 | 6.8 ns | 37.5 ns | **5.5x** |
| 64 | 6.0 ns | 9.7 ns | **1.6x** |
| 128 | 10.3 ns | 14.6 ns | **1.4x** |
| 256 | 13.3 ns | 25.4 ns | **1.9x** |
| 512 | 10.6 ns | 79.0 ns | **7.5x** |
| 1000 | 25.6 ns | 84.8 ns | **3.3x** |
| 1500 | 24.3 ns | 126.2 ns | **5.2x** |
### Incremental Checksum (`Checksum` struct)
| Bytes | ip4sum | internet-checksum | Ratio |
| ----: | ------: | ----------------: | -------: |
| 20 | 4.3 ns | 20.0 ns | **4.7x** |
| 64 | 6.8 ns | 10.1 ns | **1.5x** |
| 256 | 6.4 ns | 25.7 ns | **4.0x** |
| 1500 | 23.9 ns | 145.2 ns | **6.1x** |
### Multi-feed Incremental (20B header + 1480B payload)
| ip4sum | internet-checksum | Ratio |
| ------: | ----------------: | -------: |
| 25.3 ns | 140.0 ns | **5.5x** |
### Rust vs C Comparison
The Rust and C implementations use the same algorithm: a 64-bit wide accumulator with 32-bit reads in native byte order, deferring the carry fold and endian swap to a single final step.
The performance difference between the two comes down to how each compiler optimizes the same logical pattern. Rust's LLVM backend and C compilers (GCC, Clang, MSVC) apply different register allocation, loop vectorization, and instruction scheduling strategies to identical source-level logic. In practice, the Rust version compiled with `-C lto=fat -C codegen-units=1` tends to produce tighter inner loops because the whole-program optimization can inline and specialize more aggressively, while the C version is no slouch either and benefits from decades of loop optimization in mature C compilers.
Run benchmarks on your machine:
```
# Rust benchmarks
cargo bench
# C benchmarks
cd c && gcc -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum
```
## Why It's Fast
The key insight is simplicity. The accumulator is a 64-bit integer, and we add 32-bit words to it using plain `wrapping_add` with zero carry tracking. Since a u64 accumulator overflows after ~4 billion additions (~16 GB of data), no realistic packet comes close to the limit. Carry folding is deferred to a single cheap step at the end.
In contrast, `internet-checksum` (Fuchsia/Google) introduces per-addition overhead:
- Manual carry tracking via `overflowing_add` + boolean carry propagation
- `Option<u8>` trailing byte field with branching on every `add_bytes` call
- Size-based dispatch to a separate `add_bytes_small` path using `checked_add`
- Macro-based loop unrolling with `try_into().unwrap()` in the expansion
- Multi-function normalize chain (`normalize` -> `normalize_64` -> `adc_u32` -> `adc_u16`)
None of these "optimizations" help for any input under 16 GB. The compiler generates tighter code from a simple `wrapping_add` loop than from manual carry management.
## Testing
```
# Rust tests
cargo test
# C tests
cd c && gcc -Wall -Wextra -Werror -pedantic -std=c99 -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum
```
## Contributing
Contributions are welcome! Please:
1. Run `cargo +nightly fmt` and `cargo clippy` before submitting
2. Add tests for new functionality
3. Update documentation as needed
## License
Licensed under the [MIT License](LICENSE).
---
**Author**: Khashayar Fereidani
**Repository**: [github.com/fereidani/ip4sum](https://github.com/fereidani/ip4sum)