ip4sum -- Optimized IPv4 Internet Checksum

ip4sum is a highly optimized implementation of the Internet checksum defined in RFC 1071 and updated in RFC 1141 and RFC 1624, used in IPv4, TCP, UDP, and ICMP headers.

A portable C99 implementation is also provided in the c/ directory.

Key Features

Fast: Up to 5x faster than internet-checksum (Fuchsia/Google) on typical packet sizes
No-std compatible: Zero dependencies, works in embedded and bare-metal environments
Zero-allocation: All computation is done in-place on the stack
Incremental API: Supports multi-part checksum computation for scattered packet data
Portable C version: C99-compatible implementation with zero dependencies

Installation

cargo add ip4sum

Quick Start

One-shot Computation

let data = [0x45, 0x00, 0x00, 0x30, 0x00, 0x00, 0x40, 0x00,
            0x40, 0x01, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x01,
            0x0a, 0x00, 0x00, 0x02];

let csum = ip4sum::checksum(&data);

Incremental Computation

use ip4sum::Checksum;

let mut hasher = Checksum::new();
hasher.update(&data[..10]);
hasher.update(&[0x00, 0x00]); // checksum field placeholder
hasher.update(&data[12..]);
let csum = hasher.finalize();

C Version

#include "checksum.h"

/* one-shot */
uint16_t csum = ip4sum_checksum_oneshot(data, len);

/* incremental */
ip4sum_checksum c = ip4sum_checksum_new();
ip4sum_checksum_update(&c, header, 20);
ip4sum_checksum_update(&c, payload, 1480);
uint16_t csum = ip4sum_checksum_finalize(c);

API Reference

`checksum(data: &[u8]) -> u16`

Compute the Internet checksum of a byte slice in one shot. Returns the 16-bit one's-complement checksum in network byte order. The checksum field in the input should be set to zero before calling.

`Checksum`

An incremental checksum calculator.

Method	Description
`Checksum::new()`	Create a new calculator with accumulator initialized to zero
`update(&mut self, data: &[u8])`	Feed a slice of data into the running checksum
`finalize(self) -> u16`	Consume the calculator and return the 16-bit checksum
`reset(&mut self)`	Reset the calculator to its initial state

Performance

Benchmarks run on Linux x86_64 with Rust 1.94 (-C opt-level=3 -C lto=fat -C codegen-units=1).

One-shot Checksum (`checksum`)

Bytes	ip4sum	internet-checksum	Ratio
20	7.2 ns	33.6 ns	4.7x
40	6.8 ns	37.5 ns	5.5x
64	6.0 ns	9.7 ns	1.6x
128	10.3 ns	14.6 ns	1.4x
256	13.3 ns	25.4 ns	1.9x
512	10.6 ns	79.0 ns	7.5x
1000	25.6 ns	84.8 ns	3.3x
1500	24.3 ns	126.2 ns	5.2x

Incremental Checksum (`Checksum` struct)

Bytes	ip4sum	internet-checksum	Ratio
20	4.3 ns	20.0 ns	4.7x
64	6.8 ns	10.1 ns	1.5x
256	6.4 ns	25.7 ns	4.0x
1500	23.9 ns	145.2 ns	6.1x

Multi-feed Incremental (20B header + 1480B payload)

ip4sum	internet-checksum	Ratio
25.3 ns	140.0 ns	5.5x

Rust vs C Comparison

The Rust and C implementations use the same algorithm: a 64-bit wide accumulator with 32-bit reads in native byte order, deferring the carry fold and endian swap to a single final step.

The performance difference between the two comes down to how each compiler optimizes the same logical pattern. Rust's LLVM backend and C compilers (GCC, Clang, MSVC) apply different register allocation, loop vectorization, and instruction scheduling strategies to identical source-level logic. In practice, the Rust version compiled with -C lto=fat -C codegen-units=1 tends to produce tighter inner loops because the whole-program optimization can inline and specialize more aggressively, while the C version is no slouch either and benefits from decades of loop optimization in mature C compilers.

Run benchmarks on your machine:

# Rust benchmarks
cargo bench

# C benchmarks
cd c && gcc -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum

Why It's Fast

The key insight is simplicity. The accumulator is a 64-bit integer, and we add 32-bit words to it using plain wrapping_add with zero carry tracking. Since a u64 accumulator overflows after ~4 billion additions (~16 GB of data), no realistic packet comes close to the limit. Carry folding is deferred to a single cheap step at the end.

In contrast, internet-checksum (Fuchsia/Google) introduces per-addition overhead:

Manual carry tracking via overflowing_add + boolean carry propagation
Option<u8> trailing byte field with branching on every add_bytes call
Size-based dispatch to a separate add_bytes_small path using checked_add
Macro-based loop unrolling with try_into().unwrap() in the expansion
Multi-function normalize chain (normalize -> normalize_64 -> adc_u32 -> adc_u16)

None of these "optimizations" help for any input under 16 GB. The compiler generates tighter code from a simple wrapping_add loop than from manual carry management.

Testing

# Rust tests
cargo test

# C tests
cd c && gcc -Wall -Wextra -Werror -pedantic -std=c99 -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum

Contributing

Contributions are welcome! Please:

Run cargo +nightly fmt and cargo clippy before submitting
Add tests for new functionality
Update documentation as needed

License

Licensed under the MIT License.

Author: Khashayar Fereidani Repository: github.com/fereidani/ip4sum

ip4sum 0.1.0