cnfy-uint 0.2.2

Zero-dependency 256-bit unsigned integer arithmetic for cryptographic applications
Documentation
# cnfy-uint

**Pure Rust 256-bit modular arithmetic. Zero dependencies. Zero unsafe code.**

For cryptographic applications. No heap allocations.

Part of the [cnfy](https://www.cnfy.org) suite -- cryptographic primitives built from scratch to eliminate supply chain attacks.

## Performance

<pre>
Modular Multiplication (a * b mod P)
<b>cnfy-uint:                        11.89 ns</b>  ██
<b>cnfy-uint (--features ct):        15.42 ns</b>  ███
ruint:                            73.50 ns  █████████████
num-bigint:                      160.58 ns  █████████████████████████████

Modular Squaring (a^2 mod P)
<b>cnfy-uint:                         8.21 ns</b>  ██
<b>cnfy-uint (--features ct):        12.25 ns</b>  ███
ruint:                            70.62 ns  ████████████████
num-bigint:                      163.83 ns  █████████████████████████████████████████

Modular Addition (a + b mod P)
<b>cnfy-uint:                         1.93 ns</b>  ██
crypto-bigint:                     3.50 ns  ████
<b>cnfy-uint (--features ct):         3.75 ns</b>  ████
ruint:                             4.57 ns  █████
num-bigint:                       28.66 ns  █████████████████████████████████████

U512 Reduction (sparse_reduce)
<b>cnfy-uint:                         3.66 ns</b><b>cnfy-uint (--features ct):         7.39 ns</b>  ██
ruint:                            54.70 ns  ████████████████
crypto-bigint:                    2,128 ns  ████████████████████████████████████████████████████████████████

Modular Inverse (a^-1 mod P)
ruint:                              734 ns  ███
<b>cnfy-uint:                          802 ns</b>  ████
crypto-bigint:                    4,534 ns  █████████████████████████
<b>cnfy-uint (--features ct):        8,691 ns</b>  ███████████████████████████████████████████████

Batch Modular Inverse (Montgomery's trick)
Single:   802 ns/element
Batch(N): converges to 46 ns/element  (17.4x faster)
</pre>

Full benchmarks and methodology: [cnfy-bench](../cnfy-bench). Security analysis: [SECURITY.md](../SECURITY.md).

We benchmark against [crypto-bigint](https://crates.io/crates/crypto-bigint), [ruint](https://crates.io/crates/ruint), [ethnum](https://crates.io/crates/ethnum), [num-bigint](https://crates.io/crates/num-bigint), and [uint](https://crates.io/crates/uint) -- all excellent libraries that we've learned from.

## Why It's Fast

**Sparse prime optimization.** secp256k1's field prime `P = 2^256 - 0x1000003D1` has a tiny complement. Reduction folds excess bits instead of running full division -- every modular operation benefits.

**Cross-crate inlining.** All 47 hot-path methods are `#[inline]`. cnfy-secp256k1 gets fully inlined arithmetic without LTO.

**Algorithm selection:**
- Modular inverse: Lehmer's extended GCD with two-phase inner loop (u64/u128)
- Division: Knuth's Algorithm D with MG10 reciprocal-based quotient estimation
- Batch inverse: Montgomery's trick (one inverse + 3N multiplications for N elements)

## Quick Start

```rust
use cnfy_uint::u256::U256;

// secp256k1 field prime
let p = U256::from_be_limbs([
    0xFFFFFFFF_FFFFFFFF, 0xFFFFFFFF_FFFFFFFF,
    0xFFFFFFFF_FFFFFFFF, 0xFFFFFFFE_FFFFFC2F,
]);

let a = U256::from_be_limbs([0, 0, 0, 42]);
let b = U256::from_be_limbs([0, 0, 0, 17]);

let sum = a.add_mod(&b, &p);        // (42 + 17) mod P
let product = a.mul_mod(&b, &p);    // (42 * 17) mod P
let inverse = a.mod_inv(&p);        // 42^-1 mod P
```

## Types

| Type | Limbs | Bits | Description |
|------|:-----:|:----:|-------------|
| `U256` | 4 | 256 | Primary type -- modular arithmetic, crypto operations |
| `U320` | 5 | 320 | Intermediate: U256 * u64 product |
| `U384` | 6 | 384 | Intermediate: U256 * u128 product |
| `U512` | 8 | 512 | Widening multiply output, U256 * U256 product |
| `U256Batch<N>` | -- | -- | Batch modular inverse via Montgomery's trick |

All types use `[u64; N]` in little-endian order: `[w0_lsb, ..., wN_msb]`.

### Internal Algorithm Structs

| Type | Description |
|------|-------------|
| `GcdState` | Lehmer's extended GCD state machine |
| `KnuthD` | Knuth's Algorithm D long division |
| `LehmerStep` | GCD inner loop (u64/u128 phases) |

## API Reference

### Modular Arithmetic

U256-only. These are the core cryptographic operations.

| Method | Description | CT feature |
|--------|-------------|:----------:|
| `add_mod(&b, &m)` | `(a + b) mod m` | `ct-field` |
| `sub_mod(&b, &m)` | `(a - b) mod m` | `ct-field` |
| `mul_mod(&b, &m)` | `(a * b) mod m` -- auto-selects sparse/generic | `ct-field` |
| `square_mod(&m)` | `a^2 mod m` -- auto-selects sparse/generic | `ct-field` |
| `pow_mod(&exp, &m)` | `a^exp mod m` -- square-and-multiply | `ct-pow` |
| `mod_inv(&m)` | `a^-1 mod m` -- Lehmer GCD or SafeGCD (`ct-inv`) | `ct-inv` |
| `mod_inv_fermat(&m)` | `a^(m-2) mod m` -- Fermat's little theorem | -- | -- |
| `div_mod(&b, &m)` | `a * b^-1 mod m` | -- | -- |
| `sqrt_mod(&p)` | Tonelli-Shanks modular square root | -- | -- |
| `modulo(&m)` | `a mod m` -- reduction to `[0, m)` | `ct-field` | default |

### Wrapping Arithmetic

| Method | Description | U256 | U320 | U384 | U512 |
|--------|-------------|:----:|:----:|:----:|:----:|
| `overflowing_add(&b)` | Add with carry flag | x | x | x | x |
| `overflowing_sub(&b)` | Subtract with borrow flag | x | x | x | x |
| `checked_add(&b)` | Add, `None` on overflow | x | x | x | x |
| `checked_sub(&b)` | Subtract, `None` on underflow | x | x | x | x |
| `negate()` | Two's complement negation | x | x | x | x |

### Multiplication

| Method | Description | U256 | U320 | U384 | U512 |
|--------|-------------|:----:|:----:|:----:|:----:|
| `widening_mul(&b) -> U512` | Full 512-bit product | x | | | |
| `square() -> U512` | Full 512-bit square | x | | | |
| `mul_u64(b) -> U320` | U256 * u64 -> 320-bit | x | | | |
| `mul_u128(b) -> U384` | U256 * u128 -> 384-bit | x | | | |

### Division

| Method | Description | U256 | U320 | U384 | U512 |
|--------|-------------|:----:|:----:|:----:|:----:|
| `div_rem(&b)` | Quotient and remainder | x | | | |
| `div_u64(b)` | Divide by u64 | | x | x | x |
| `div_u256(b)` | Divide by U256 (Knuth D) | | | | x |

### Reduction (U512 only)

| Method | Description | CT feature |
|--------|-------------|:----------:|
| `sparse_reduce(&compl)` | Fold using sparse prime structure | `ct-field` |

### Bit Operations

| Method | Description | U256 | U320 | U384 | U512 |
|--------|-------------|:----:|:----:|:----:|:----:|
| `bit(i)` | Test bit at position `i` | x | | | |
| `shl(n)` / `shr(n)` | Shift left/right | x | x | x | x |
| `leading_zeros()` | Count leading zero bits | x | x | x | x |
| `trailing_zeros()` | Count trailing zero bits | x | x | x | x |
| `bit_len()` | Number of significant bits | x | x | x | x |
| `sig_limbs()` | Number of significant limbs | x | x | x | x |
| `is_zero()` | Test for zero | x | x | x | x |
| `is_even()` / `is_odd()` | Parity test | x | x | x | x |
| `is_power_of_two()` | Test if exactly one bit is set | x | | | |
| `count_ones()` | Population count (Hamming weight) | x | | | |

### Morton / Z-Order

Bit dilation, undilation, interleaving, and deinterleaving for spatial hashing and Z-order curves.

| Method | Description | Level |
|--------|-------------|:-----:|
| `U256::from_u128_dilated_even(v)` | Dilate u128 → even positions (0,2,4,...,254) | u128→U256 |
| `U256::from_u128_dilated_odd(v)` | Dilate u128 → odd positions (1,3,5,...,255) | u128→U256 |
| `U256::interleave(x, y)` | Interleave two u128 → U256 Morton code | u128→U256 |
| `.undilate_even()` | Extract even positions → u128 | U256→u128 |
| `.undilate_odd()` | Extract odd positions → u128 | U256→u128 |
| `.deinterleave()` | Split into (u128, u128) | U256→u128 |
| `.dilate_even()` | Dilate U256 → even positions of U512 | U256→U512 |
| `.dilate_odd()` | Dilate U256 → odd positions of U512 | U256→U512 |
| `U512::interleave(&x, &y)` | Interleave two U256 → U512 Morton code | U256→U512 |
| `.undilate_even()` | Extract even positions → U256 | U512→U256 |
| `.undilate_odd()` | Extract odd positions → U256 | U512→U256 |
| `.deinterleave()` | Split into (U256, U256) | U512→U256 |

### Constant-Time Primitives

U256-only.

| Method | Description |
|--------|-------------|
| `ct_select(&other, flag)` | Branchless conditional select -- returns `self` when `flag` is true |
| `has_u64_complement(&m)` | Test if `2^256 - m` fits in u64 (sparse prime detection) |

### Conversion

| Method | Description | U256 | U320 | U384 | U512 |
|--------|-------------|:----:|:----:|:----:|:----:|
| `from_be_limbs(arr)` | From big-endian limbs | x | x | x | x |
| `from_le_limbs(arr)` | From little-endian limbs | x | x | x | x |
| `from_be_bytes(arr)` | From big-endian bytes | x | x | x | x |
| `from_le_bytes(arr)` | From little-endian bytes | x | x | x | x |
| `to_be_limbs()` | To big-endian limbs | x | x | x | x |
| `to_le_limbs()` | To little-endian limbs | x | x | x | x |
| `to_be_bits()` | To big-endian `[bool; N]` (MSB first) | x | | | |
| `to_le_bits()` | To little-endian `[bool; N]` (LSB first) | x | | | |
| `to_be_bytes()` | To big-endian bytes | x | x | x | x |
| `to_le_bytes()` | To little-endian bytes | x | x | x | x |
| `Into<[u8; N]>` | To big-endian bytes (trait) | x | x | x | x |
| `from(u64)` | From u64 | x | x | x | x |
| `from(u128)` | From u128 | x | x | x | x |
| `from(U256)` | Widen from U256 | | x | x | x |
| `from(U320)` | Widen from U320 | | | x | x |
| `from(U384)` | Widen from U384 | | | | x |
| `low_u256()` | Lower 256 bits | | x | x | x |
| `high_u256()` | Upper 256 bits | | | | x |
| `low_u128()` / `high_u128()` | Lower/upper 128 bits | x | | | |
| `non_zero()` | `Option<Self>` for early returns | x | x | x | x |
| `const_eq(&other)` | Const-compatible equality | x | x | x | x |

### Standard Traits

| Trait | U256 | U320 | U384 | U512 |
|-------|:----:|:----:|:----:|:----:|
| `Add` / `Sub` | x | x | x | x |
| `Div` / `Rem` | x | | | |
| `AddAssign` | x | | | |
| `BitAnd` / `BitOr` / `BitXor` / `Not` | x | x | x | x |
| `Shl` / `Shr` | x | x | x | x |
| `Display` | x | x | x | x |
| `LowerHex` / `UpperHex` | x | x | x | x |
| `FromStr` | x | x | x | x |
| `Default` | x | x | x | x |
| `Ord` / `PartialOrd` | x | x | x | x |
| `Hash` | x | | | |

### U256Batch\<N\>

Const-generic batch container for `N` values of [`U256`]. Wraps a `[U256; N]` array with batch operations that amortize expensive computations across all elements.

The primary use case is **Montgomery batch inversion**: computing `N` modular inverses using a single `mod_inv` call plus `3(N-1)` modular multiplications. At large `N`, this reduces per-element inversion cost from ~802 ns to ~46 ns (17x speedup). Used by cnfy-secp256k1 for batch Jacobian-to-affine conversion and batch public key generation.

| Method | Description |
|--------|-------------|
| `U256Batch(arr)` | Construct from a `[U256; N]` array |
| `zeroed()` | Create a batch with all elements set to `U256::ZERO` |
| `mod_inv(&m)` | Montgomery batch inversion -- returns `Option<[U256; N]>` (`None` if any element is zero) |
| `mod_inv_vt(&m)` | Variable-time batch inversion (always Lehmer GCD, ignores `ct-inv`) |
| `.0[i]` | Direct element access (public inner array) |

## Constant-Time Feature Flags

cnfy-uint provides granular compile-time feature flags that switch methods to constant-time implementations:

| Feature | What it covers |
|---------|---------------|
| `ct-field` | `add_mod`, `sub_mod`, `mul_mod_sparse`, `square_mod_sparse`, `sparse_reduce`, `modulo` |
| `ct-inv` | `mod_inv` (SafeGCD / Bernstein-Yang divsteps); implies `ct-field` |
| `ct-pow` | `pow_mod` (always 256 iterations); implies `ct-field` |
| `ct` | Umbrella enabling all of the above |

Constant-time overhead: 30-49% on field ops, 10.8x on mod_inv, 8% on pow_mod. See [SECURITY.md](../SECURITY.md) for details.

## What This Powers

[cnfy-secp256k1](../cnfy-secp256k1) -- scalar multiplication 2.3x faster than the C FFI, public key addition 3.7x faster.

## Testing

```bash
cargo test -p cnfy-uint            # All tests
cargo test -p cnfy-uint u256       # U256 module only
cargo test -p cnfy-uint mul_mod    # Single test by name
```

100% test coverage. Validated against known test vectors.

## Built With

This library was built with the accompaniment of [Claude Code](https://claude.ai/code) (Claude Opus 4.6, Max plan).

## Links

- [cnfy.org]https://www.cnfy.org -- project website
- [Source]https://codeberg.org/cnfy/cnfy-lib -- Codeberg repository

## License

Licensed under the [Shariah-Informed Open License (SIOL-1.0)](../LICENSE).