csa-rhdl 0.1.0

Carry-save adder compressor trees composed via comp-cat-rs, with hdl-cat backend
Documentation
# csa-rhdl

Carry-save adder compressor trees composed through
[comp-cat-rs](../comp-cat-rs) categorical morphisms, with an
[hdl-cat](../hdl-cat) backend for hardware synthesis and simulation.

Implements the [supranational/hardware `rtl/csa`](https://github.com/supranational/hardware/tree/master/rtl/csa)
architecture as a type-driven, delay-run combinator library:

- `full_adder`: 1-bit full adder cell (hdl-cat circuit arrow)
- `csa_3to2<N>`: `N`-wide three-to-two compressor (tensor power of `full_adder`)
- `tree_level`: groups `M` operands into triples and reduces
- `compressor_tree<M, W>`: recursive `M -> 2` compressor via free-category path
- `RegisteredCompressor<M, W, OUT_W, D>`: single-stage pipeline placeholder
- `lower`: translate abstract circuit AST to concrete hdl-cat IR

## Categorical structure

| Hardware | Category theory |
|---|---|
| signal bundle | object (`Shape`) |
| combinational circuit | morphism (`CircuitArrow`) |
| parallel circuits | tensor product |
| wire permutation | braiding |
| tree reduction | catamorphism over `Stream<CsaError, LevelDescriptor>` |

Composition is **delay-run**: the tree is built as a lazy
`Io<CsaError, CircuitArrow>`.  The caller invokes `.run()` exactly
once at the boundary (verilog generation, simulation, or shape
inspection).

## Simulation

With `hdl-cat-gates` enabled, circuits can be simulated using
hdl-cat's interpreter.  The example below builds a 4-bit CSA 3-to-2
compressor and checks the arithmetic identity `a + b + cin == s + (cout << 1)`.

```rust
use csa_rhdl::gates::csa_3to2;
use hdl_cat::kind::BitSeq;
use hdl_cat::sim::interp::interpret;

fn main() -> Result<(), hdl_cat::Error> {
    let csa = csa_3to2::<4>()?;

    // Build 4-bit input sequences (LSB-first).
    let a: BitSeq   = (0..4).map(|i| (5u128 >> i) & 1 == 1).collect();
    let b: BitSeq   = (0..4).map(|i| (3u128 >> i) & 1 == 1).collect();
    let cin: BitSeq = (0..4).map(|i| (7u128 >> i) & 1 == 1).collect();

    // Evaluate the combinational circuit.
    let env = interpret(csa.graph(), csa.inputs(), &[a, b, cin])?;

    // Read output wires safely (no array indexing).
    let s_wire = csa.outputs().first().copied()
        .ok_or(hdl_cat::Error::Overflow { width: hdl_cat::Width::new(0) })?;
    let cout_wire = csa.outputs().get(1).copied()
        .ok_or(hdl_cat::Error::Overflow { width: hdl_cat::Width::new(0) })?;

    let s_bits = env.get(s_wire.index())
        .and_then(Clone::clone)
        .unwrap_or_default();
    let cout_bits = env.get(cout_wire.index())
        .and_then(Clone::clone)
        .unwrap_or_default();

    // Convert BitSeq back to u128 for verification.
    let to_u128 = |seq: &BitSeq| -> u128 {
        seq.as_slice().iter().enumerate().fold(0u128, |acc, (i, bit)| {
            acc | (u128::from(*bit) << i)
        })
    };
    let s_val = to_u128(&s_bits);
    let cout_val = to_u128(&cout_bits);
    assert_eq!(5 + 3 + 7, s_val + (cout_val << 1));
    Ok(())
}
```

The same approach works for the abstract compressor tree via the
`lower` module:

```rust
use csa_rhdl::prelude::*;
use hdl_cat::sim::interp::interpret;

fn run() -> Result<(), Box<dyn std::error::Error>> {
    let tree = compressor_tree(5, 8)?;
    let lowered = lower(&tree)?;
    let zeros = hdl_cat::kind::BitSeq::from_iter(vec![false; 8]);
    let inputs: Vec<hdl_cat::kind::BitSeq> =
        lowered.inputs().iter().map(|_| zeros.clone()).collect();
    let env = interpret(lowered.graph(), lowered.inputs(), &inputs)?;
    let s_wire = lowered.outputs().first().copied()
        .ok_or("missing sum wire")?;
    let _s = env.get(s_wire.index())
        .and_then(Clone::clone)
        .unwrap_or_default();
    Ok(())
}
```

## Verilog emission

With the `verilog` feature enabled, a lowered graph can be rendered to
Verilog.  The emitter returns an `Io` effect; call `.run()` at the
boundary to execute.

```rust
use csa_rhdl::prelude::*;
use hdl_cat::verilog::emitter::emit_graph;

fn run() -> Result<(), Box<dyn std::error::Error>> {
    // Build a 9-operand, 16-bit compressor tree and lower to IR.
    let tree = compressor_tree(9, 16)?;
    let lowered = lower(&tree)?;

    // Emit Verilog (delay-run: stays inside Io until .run()).
    let verilog_text = emit_graph(
        lowered.graph(),
        "csa_9to2_w16",
        lowered.inputs(),
        lowered.outputs(),
    )
    .flat_map(|module| module.render())
    .run()?;

    println!("{verilog_text}");
    Ok(())
}
```

The generated module has one port per input/output wire.  For a
9-operand tree this produces nine input ports and two output ports
(sum and carry), with all internal CSA logic as continuous `assign`
statements.

## Features

- `hdl-cat-gates` (opt-in): hdl-cat circuit definitions, lowering, and
  the registered compressor.  Depends on
  [hdl-cat]https://github.com/MavenRain/hdl-cat.
- `verilog` (opt-in, implies `hdl-cat-gates`): Verilog codegen.

Without any feature flags, the crate provides the full categorical
tree compiler as a pure combinator library.

## License

Dual-licensed under MIT or Apache-2.0, at your option.