cnfy-uint

Pure Rust 256-bit modular arithmetic. Zero dependencies. Zero unsafe code.

For cryptographic applications. No heap allocations.

Part of the cnfy suite -- cryptographic primitives built from scratch to eliminate supply chain attacks.

Performance

Full benchmarks and methodology: cnfy-bench. Security analysis: SECURITY.md.

We benchmark against crypto-bigint, ruint, ethnum, num-bigint, and uint -- all excellent libraries that we've learned from.

Why It's Fast

Sparse prime optimization. secp256k1's field prime P = 2^256 - 0x1000003D1 has a tiny complement. Reduction folds excess bits instead of running full division -- every modular operation benefits.

Cross-crate inlining. All 47 hot-path methods are #[inline]. cnfy-secp256k1 gets fully inlined arithmetic without LTO.

Algorithm selection:

Modular inverse: Lehmer's extended GCD with two-phase inner loop (u64/u128)
Division: Knuth's Algorithm D with MG10 reciprocal-based quotient estimation
Batch inverse: Montgomery's trick (one inverse + 3N multiplications for N elements)

Quick Start

use cnfy_uint::u256::U256;

// secp256k1 field prime
let p = U256::from_be_limbs([
    0xFFFFFFFF_FFFFFFFF, 0xFFFFFFFF_FFFFFFFF,
    0xFFFFFFFF_FFFFFFFF, 0xFFFFFFFE_FFFFFC2F,
]);

let a = U256::from_be_limbs([0, 0, 0, 42]);
let b = U256::from_be_limbs([0, 0, 0, 17]);

let sum = a.add_mod(&b, &p);        // (42 + 17) mod P
let product = a.mul_mod(&b, &p);    // (42 * 17) mod P
let inverse = a.mod_inv(&p);        // 42^-1 mod P

Types

Type	Limbs	Bits	Description
`U256`	4	256	Primary type -- modular arithmetic, crypto operations
`U320`	5	320	Intermediate: U256 * u64 product
`U384`	6	384	Intermediate: U256 * u128 product
`U512`	8	512	Widening multiply output, U256 * U256 product
`U256Batch<N>`	--	--	Batch modular inverse via Montgomery's trick

All types use [u64; N] in little-endian order: [w0_lsb, ..., wN_msb].

Internal Algorithm Structs

Type	Description
`GcdState`	Lehmer's extended GCD state machine
`KnuthD`	Knuth's Algorithm D long division
`LehmerStep`	GCD inner loop (u64/u128 phases)

API Reference

Modular Arithmetic

U256-only. These are the core cryptographic operations.

Method	Description	CT feature
`add_mod(&b, &m)`	`(a + b) mod m`	`ct-field`
`sub_mod(&b, &m)`	`(a - b) mod m`	`ct-field`
`neg_mod(&m)`	`(-a) mod m` -- modular negation	`ct-field`
`mul_mod(&b, &m)`	`(a * b) mod m` -- auto-selects sparse/generic	`ct-field`
`square_mod(&m)`	`a^2 mod m` -- auto-selects sparse/generic	`ct-field`
`pow_mod(&exp, &m)`	`a^exp mod m` -- square-and-multiply	`ct-pow`
`mod_inv(&m)`	`a^-1 mod m` -- Lehmer GCD or SafeGCD (`ct-inv`)	`ct-inv`
`mod_inv_fermat(&m)`	`a^(m-2) mod m` -- Fermat's little theorem	--
`div_mod(&b, &m)`	`a * b^-1 mod m`	--
`sqrt_mod(&p)`	Tonelli-Shanks modular square root	--
`modulo(&m)`	`a mod m` -- reduction to `[0, m)`	`ct-field`

Wrapping Arithmetic

Method	Description	U256	U320	U384	U512
`overflowing_add(&b)`	Add with carry flag	x	x	x	x
`overflowing_sub(&b)`	Subtract with borrow flag	x	x	x	x
`checked_add(&b)`	Add, `None` on overflow	x	x	x	x
`checked_sub(&b)`	Subtract, `None` on underflow	x	x	x	x
`negate()`	Two's complement negation	x	x	x	x

Multiplication

Method	Description	U256
`widening_mul(&b) -> U512`	Full 512-bit product	x
`square() -> U512`	Full 512-bit square	x
`mul_u64(b) -> U320`	U256 * u64 -> 320-bit	x
`mul_u128(b) -> U384`	U256 * u128 -> 384-bit	x

Division

Method	Description	U256	U320	U384	U512
`div_rem(&b)`	Quotient and remainder	x
`div_u64(b)`	Divide by u64		x	x	x
`div_u256(b)`	Divide by U256 (Knuth D)				x

Reduction (U512 only)

Method	Description	CT feature
`sparse_reduce(&compl)`	Fold using sparse prime structure	`ct-field`

Bit Operations

Method	Description	U256	U320	U384	U512
`bit(i)`	Test bit at position `i`	x
`shl(n)` / `shr(n)`	Shift left/right	x	x	x	x
`leading_zeros()`	Count leading zero bits	x	x	x	x
`trailing_zeros()`	Count trailing zero bits	x	x	x	x
`bit_len()`	Number of significant bits	x	x	x	x
`sig_limbs()`	Number of significant limbs	x	x	x	x
`is_zero()`	Test for zero	x	x	x	x
`is_even()` / `is_odd()`	Parity test	x	x	x	x
`is_power_of_two()`	Test if exactly one bit is set	x
`count_ones()`	Population count (Hamming weight)	x

Morton / Z-Order

Bit dilation, undilation, interleaving, and deinterleaving for spatial hashing and Z-order curves.

Method	Description	Level
`U256::from_u128_dilated_even(v)`	Dilate u128 → even positions (0,2,4,...,254)	u128→U256
`U256::from_u128_dilated_odd(v)`	Dilate u128 → odd positions (1,3,5,...,255)	u128→U256
`U256::interleave(x, y)`	Interleave two u128 → U256 Morton code	u128→U256
`.undilate_even()`	Extract even positions → u128	U256→u128
`.undilate_odd()`	Extract odd positions → u128	U256→u128
`.deinterleave()`	Split into (u128, u128)	U256→u128
`.dilate_even()`	Dilate U256 → even positions of U512	U256→U512
`.dilate_odd()`	Dilate U256 → odd positions of U512	U256→U512
`U512::interleave(&x, &y)`	Interleave two U256 → U512 Morton code	U256→U512
`.undilate_even()`	Extract even positions → U256	U512→U256
`.undilate_odd()`	Extract odd positions → U256	U512→U256
`.deinterleave()`	Split into (U256, U256)	U512→U256

Constant-Time Primitives

U256-only.

Method	Description
`ct_select(&other, flag)`	Branchless conditional select -- returns `self` when `flag` is true
`has_u64_complement(&m)`	Test if `2^256 - m` fits in u64 (sparse prime detection)

Conversion

Method	Description	U256	U320	U384	U512
`from_be_limbs(arr)`	From big-endian limbs	x	x	x	x
`from_le_limbs(arr)`	From little-endian limbs	x	x	x	x
`from_be_bytes(arr)`	From big-endian bytes	x	x	x	x
`from_le_bytes(arr)`	From little-endian bytes	x	x	x	x
`to_be_limbs()`	To big-endian limbs	x	x	x	x
`to_le_limbs()`	To little-endian limbs	x	x	x	x
`from_be_bits(arr)`	From big-endian `[bool; 256]` (MSB first)	x
`from_le_bits(arr)`	From little-endian `[bool; 256]` (LSB first)	x
`to_be_bits()`	To big-endian `[bool; N]` (MSB first)	x
`to_le_bits()`	To little-endian `[bool; N]` (LSB first)	x
`to_be_bytes()`	To big-endian bytes	x	x	x	x
`to_le_bytes()`	To little-endian bytes	x	x	x	x
`Into<[u8; N]>`	To big-endian bytes (trait)	x	x	x	x
`from(u64)`	From u64	x	x	x	x
`from(u128)`	From u128	x	x	x	x
`from(U256)`	Widen from U256		x	x	x
`from(U320)`	Widen from U320			x	x
`from(U384)`	Widen from U384				x
`low_u256()`	Lower 256 bits		x	x	x
`high_u256()`	Upper 256 bits				x
`low_u128()` / `high_u128()`	Lower/upper 128 bits	x
`non_zero()`	`Option<Self>` for early returns	x	x	x	x
`const_eq(&other)`	Const-compatible equality	x	x	x	x

Standard Traits

Trait	U256	U320	U384	U512
`Add` / `Sub`	x	x	x	x
`Div` / `Rem`	x
`AddAssign`	x
`BitAnd` / `BitOr` / `BitXor` / `Not`	x	x	x	x
`Shl` / `Shr`	x	x	x	x
`Display`	x	x	x	x
`LowerHex` / `UpperHex`	x	x	x	x
`FromStr`	x	x	x	x
`Default`	x	x	x	x
`Ord` / `PartialOrd`	x	x	x	x
`Hash`	x

U256Batch<N>

Const-generic batch container for N values of [U256]. Wraps a [U256; N] array with batch operations that amortize expensive computations across all elements.

The primary use case is Montgomery batch inversion: computing N modular inverses using a single mod_inv call plus 3(N-1) modular multiplications. At large N, this reduces per-element inversion cost from ~802 ns to ~46 ns (17x speedup). Used by cnfy-secp256k1 for batch Jacobian-to-affine conversion and batch public key generation.

Method	Description
`U256Batch(arr)`	Construct from a `[U256; N]` array
`zeroed()`	Create a batch with all elements set to `U256::ZERO`
`mod_inv(&m)`	Montgomery batch inversion -- returns `Option<[U256; N]>` (`None` if any element is zero)
`mod_inv_vt(&m)`	Variable-time batch inversion (always Lehmer GCD, ignores `ct-inv`)
`.0[i]`	Direct element access (public inner array)

Constant-Time Feature Flags

cnfy-uint provides granular compile-time feature flags that switch methods to constant-time implementations:

Feature	What it covers
`ct-field`	`add_mod`, `sub_mod`, `mul_mod_sparse`, `square_mod_sparse`, `sparse_reduce`, `modulo`
`ct-inv`	`mod_inv` (SafeGCD / Bernstein-Yang divsteps); implies `ct-field`
`ct-pow`	`pow_mod` (always 256 iterations); implies `ct-field`
`ct`	Umbrella enabling all of the above

Constant-time overhead: 30-49% on field ops, 10.8x on mod_inv, 8% on pow_mod. See SECURITY.md for details.

What This Powers

cnfy-secp256k1 -- scalar multiplication 2.3x faster than the C FFI, public key addition 3.7x faster.

Testing

cargo test -p cnfy-uint            # All tests
cargo test -p cnfy-uint u256       # U256 module only
cargo test -p cnfy-uint mul_mod    # Single test by name

100% test coverage. Validated against known test vectors.

Ethics

See ETHICS.md for the ethical principles that guide this project.

License

Licensed under the Shariah-Informed Open License (SIOL-1.0).

cnfy-uint 0.2.4