coren 0.1.5

Machine capability detection and compute normalization
Documentation
# coren

Compute and resource normalization.

Measures what your machine can do. Built for
[irohds](https://codeberg.org/cab/irohds) -- to tell you whether a computation
is faster to run locally or fetch from the network.

Any two machines looking at the same function can independently agree on
how much work it requires (an **_almost_** deterministic op count). Each machine knows
its own capabilities (benchmarked once at startup). The verdict is local
arithmetic: compare estimated compute time against estimated fetch time.

Integration into [irohds](https://codeberg.org/cab/irohds), a decentralized
memoization system for scientific computing, was the initial reason
for making this package. It is also useful as a standalone tool for
roofline analysis, ETL buffer sizing, and build parallelism decisions.

## Install

```sh
uv add coren
```

Or from source:

```sh
uv run maturin develop --features python
```

## Usage

```python
from coren import FnCost, MachCap

# Describe the work (deterministic, same on every machine)
cost = FnCost.sort(1_000_000, 64, 64_000_000)

# Measure this machine (benchmarks run once, cached)
cap = MachCap.read()

# Get the answer
v = cap.verdict(cost)
print(v)  # "compute (saves 0.712s)" or "fetch (saves 2.3s)"

if v.should_fetch():
    download_result()
else:
    compute_locally()
```

## How it works

Two layers:

**FnCost** describes a function's resource requirements in absolute
physical units. Four integers: `ops` (total arithmetic operations),
`mem_bytes` (memory traffic), `peak_mem` (peak RAM footprint),
`result_bytes` (output size). These are properties of the algorithm
and its inputs. Bitwise identical on every machine.

**MachCap** describes what this machine can do. Measured via
micro-benchmarks (FMA throughput, STREAM triad, disk sequential I/O)
and OS queries (NIC link speed, battery state, core count, RAM).
Produces a roofline model: peak ops/s and memory bandwidth.

The **verdict** compares estimated compute time (from the roofline
model) against estimated fetch time (result_bytes / NIC bandwidth).
The score is the difference in seconds: positive means fetch is
faster, negative means compute is faster. Infinity means one option
is impossible (no RAM, or no network).

## FnCost constructors

```
FnCost.new(ops, mem_bytes, peak_mem, result_bytes)   raw values
FnCost.scan(n_bytes, result_bytes)                   linear scan
FnCost.sort(n, item_bytes, result_bytes)             merge sort
FnCost.hash(n_bytes)                                 crypto hash
FnCost.matmul(m, n, k, result_bytes)                 dense GEMM
FnCost.etl(rows, row_bytes, ops_per_row, result_bytes)  row processing
FnCost.copy(size)                                    file copy (ops=0)
```

## Combinators

```python
a.then(b)    # sequential: ops sum, peak_mem = max, result = b's output
a + b        # same as then
a.par(b)     # parallel: ops = max, peak_mem sums, result sums
a.repeat(k)  # k iterations, peak_mem unchanged
```

## Normalizing wall-clock measurements

When a function is executed and you only know the wall-clock time (not
the algorithmic complexity), `MachCap.normalize()` converts the
measurement into a FnCost suitable for local verdict computation.

**WARNING: `normalize()` output is NOT deterministic across machines.**
Different machines produce different ops/mem_bytes values for the same
function. Do NOT use `normalize()`-produced FnCost as cache keys,
content addresses, or any identifier that must match across peers.
For cache keys, use the static constructors (`sort`, `hash`, `matmul`,
etc.) or `FnCost.new()` with values derived from the function's
definition and parameters.

```python
cost = cap.normalize(compute_ns=15_000_000_000, peak_mem=4_000_000_000, result_bytes=500_000_000)
# cost is a safe overestimate, suitable for verdict() but NOT for cache keys
```

## CLI

```sh
$ coren
coren  [desktop]
  cores                    8p / 16l
  frequency                3600 MHz

  roofline (measured)
    peak (all cores)       89.2 GFLOPS
    mem bandwidth          38.1 GB/s
    disk bandwidth         1.52 GB/s
    ridge point            2.34 ops/byte

  verdicts (what should this machine do?)
    task                       ops         Q         R    score       neck action
    sort 1M x 64B          200.0M     1.2GB    10.0MB   -0.712   memory compute
    matmul 1k^3              2.0G    22.9MB    10.0MB   -0.779  compute compute

$ coren --json   # machine-readable output
```

## Rust

```rust
use coren::{FnCost, MachCap};

let cost = FnCost::sort(1_000_000, 64, 64_000_000);
let cap = MachCap::read(".");
let v = cap.verdict(&cost);

if v.should_fetch() {
    // download from peer
} else {
    // compute locally
}
```

## License

MIT OR Apache-2.0