ip4sum 0.1.0

Highly optimized IPv4 checksum calculation, no-std compatible
Documentation
  • Coverage
  • 100%
    3 out of 3 items documented2 out of 3 items with examples
  • Size
  • Source code size: 62.49 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 863.67 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 44s Average build duration of successful builds.
  • all releases: 44s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • fereidani/ip4sum
    3 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • fereidani

ip4sum -- Optimized IPv4 Internet Checksum

Crates.io Documentation License: MIT

ip4sum is a highly optimized implementation of the Internet checksum defined in RFC 1071 and updated in RFC 1141 and RFC 1624, used in IPv4, TCP, UDP, and ICMP headers.

A portable C99 implementation is also provided in the c/ directory.

Key Features

  • Fast: Up to 5x faster than internet-checksum (Fuchsia/Google) on typical packet sizes
  • No-std compatible: Zero dependencies, works in embedded and bare-metal environments
  • Zero-allocation: All computation is done in-place on the stack
  • Incremental API: Supports multi-part checksum computation for scattered packet data
  • Portable C version: C99-compatible implementation with zero dependencies

Installation

cargo add ip4sum

Quick Start

One-shot Computation

let data = [0x45, 0x00, 0x00, 0x30, 0x00, 0x00, 0x40, 0x00,
            0x40, 0x01, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x01,
            0x0a, 0x00, 0x00, 0x02];

let csum = ip4sum::checksum(&data);

Incremental Computation

use ip4sum::Checksum;

let mut hasher = Checksum::new();
hasher.update(&data[..10]);
hasher.update(&[0x00, 0x00]); // checksum field placeholder
hasher.update(&data[12..]);
let csum = hasher.finalize();

C Version

#include "checksum.h"

/* one-shot */
uint16_t csum = ip4sum_checksum_oneshot(data, len);

/* incremental */
ip4sum_checksum c = ip4sum_checksum_new();
ip4sum_checksum_update(&c, header, 20);
ip4sum_checksum_update(&c, payload, 1480);
uint16_t csum = ip4sum_checksum_finalize(c);

API Reference

checksum(data: &[u8]) -> u16

Compute the Internet checksum of a byte slice in one shot. Returns the 16-bit one's-complement checksum in network byte order. The checksum field in the input should be set to zero before calling.

Checksum

An incremental checksum calculator.

Method Description
Checksum::new() Create a new calculator with accumulator initialized to zero
update(&mut self, data: &[u8]) Feed a slice of data into the running checksum
finalize(self) -> u16 Consume the calculator and return the 16-bit checksum
reset(&mut self) Reset the calculator to its initial state

Performance

Benchmarks run on Linux x86_64 with Rust 1.94 (-C opt-level=3 -C lto=fat -C codegen-units=1).

One-shot Checksum (checksum)

Bytes ip4sum internet-checksum Ratio
20 7.2 ns 33.6 ns 4.7x
40 6.8 ns 37.5 ns 5.5x
64 6.0 ns 9.7 ns 1.6x
128 10.3 ns 14.6 ns 1.4x
256 13.3 ns 25.4 ns 1.9x
512 10.6 ns 79.0 ns 7.5x
1000 25.6 ns 84.8 ns 3.3x
1500 24.3 ns 126.2 ns 5.2x

Incremental Checksum (Checksum struct)

Bytes ip4sum internet-checksum Ratio
20 4.3 ns 20.0 ns 4.7x
64 6.8 ns 10.1 ns 1.5x
256 6.4 ns 25.7 ns 4.0x
1500 23.9 ns 145.2 ns 6.1x

Multi-feed Incremental (20B header + 1480B payload)

ip4sum internet-checksum Ratio
25.3 ns 140.0 ns 5.5x

Rust vs C Comparison

The Rust and C implementations use the same algorithm: a 64-bit wide accumulator with 32-bit reads in native byte order, deferring the carry fold and endian swap to a single final step.

The performance difference between the two comes down to how each compiler optimizes the same logical pattern. Rust's LLVM backend and C compilers (GCC, Clang, MSVC) apply different register allocation, loop vectorization, and instruction scheduling strategies to identical source-level logic. In practice, the Rust version compiled with -C lto=fat -C codegen-units=1 tends to produce tighter inner loops because the whole-program optimization can inline and specialize more aggressively, while the C version is no slouch either and benefits from decades of loop optimization in mature C compilers.

Run benchmarks on your machine:

# Rust benchmarks
cargo bench

# C benchmarks
cd c && gcc -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum

Why It's Fast

The key insight is simplicity. The accumulator is a 64-bit integer, and we add 32-bit words to it using plain wrapping_add with zero carry tracking. Since a u64 accumulator overflows after ~4 billion additions (~16 GB of data), no realistic packet comes close to the limit. Carry folding is deferred to a single cheap step at the end.

In contrast, internet-checksum (Fuchsia/Google) introduces per-addition overhead:

  • Manual carry tracking via overflowing_add + boolean carry propagation
  • Option<u8> trailing byte field with branching on every add_bytes call
  • Size-based dispatch to a separate add_bytes_small path using checked_add
  • Macro-based loop unrolling with try_into().unwrap() in the expansion
  • Multi-function normalize chain (normalize -> normalize_64 -> adc_u32 -> adc_u16)

None of these "optimizations" help for any input under 16 GB. The compiler generates tighter code from a simple wrapping_add loop than from manual carry management.

Testing

# Rust tests
cargo test

# C tests
cd c && gcc -Wall -Wextra -Werror -pedantic -std=c99 -O2 checksum.c test_checksum.c -o test_checksum && ./test_checksum

Contributing

Contributions are welcome! Please:

  1. Run cargo +nightly fmt and cargo clippy before submitting
  2. Add tests for new functionality
  3. Update documentation as needed

License

Licensed under the MIT License.


Author: Khashayar Fereidani Repository: github.com/fereidani/ip4sum