# csa-rhdl
Carry-save adder compressor trees composed through
[comp-cat-rs](../comp-cat-rs) categorical morphisms, with an
[hdl-cat](../hdl-cat) backend for hardware synthesis and simulation.
Implements the [supranational/hardware `rtl/csa`](https://github.com/supranational/hardware/tree/master/rtl/csa)
architecture as a type-driven, delay-run combinator library:
- `full_adder`: 1-bit full adder cell (hdl-cat circuit arrow)
- `csa_3to2<N>`: `N`-wide three-to-two compressor (tensor power of `full_adder`)
- `tree_level`: groups `M` operands into triples and reduces
- `compressor_tree<M, W>`: recursive `M -> 2` compressor via free-category path
- `RegisteredCompressor<M, W, OUT_W, D>`: single-stage pipeline placeholder
- `lower`: translate abstract circuit AST to concrete hdl-cat IR
## Categorical structure
| signal bundle | object (`Shape`) |
| combinational circuit | morphism (`CircuitArrow`) |
| parallel circuits | tensor product |
| wire permutation | braiding |
| tree reduction | catamorphism over `Stream<CsaError, LevelDescriptor>` |
Composition is **delay-run**: the tree is built as a lazy
`Io<CsaError, CircuitArrow>`. The caller invokes `.run()` exactly
once at the boundary (verilog generation, simulation, or shape
inspection).
## Simulation
With `hdl-cat-gates` enabled, circuits can be simulated using
hdl-cat's interpreter. The example below builds a 4-bit CSA 3-to-2
compressor and checks the arithmetic identity `a + b + cin == s + (cout << 1)`.
```rust
use csa_rhdl::gates::csa_3to2;
use hdl_cat::kind::BitSeq;
use hdl_cat::sim::interp::interpret;
fn main() -> Result<(), hdl_cat::Error> {
let csa = csa_3to2::<4>()?;
// Build 4-bit input sequences (LSB-first).
let a: BitSeq = (0..4).map(|i| (5u128 >> i) & 1 == 1).collect();
let b: BitSeq = (0..4).map(|i| (3u128 >> i) & 1 == 1).collect();
let cin: BitSeq = (0..4).map(|i| (7u128 >> i) & 1 == 1).collect();
// Evaluate the combinational circuit.
let env = interpret(csa.graph(), csa.inputs(), &[a, b, cin])?;
// Read output wires safely (no array indexing).
let s_wire = csa.outputs().first().copied()
.ok_or(hdl_cat::Error::Overflow { width: hdl_cat::Width::new(0) })?;
let cout_wire = csa.outputs().get(1).copied()
.ok_or(hdl_cat::Error::Overflow { width: hdl_cat::Width::new(0) })?;
let s_bits = env.get(s_wire.index())
.and_then(Clone::clone)
.unwrap_or_default();
let cout_bits = env.get(cout_wire.index())
.and_then(Clone::clone)
.unwrap_or_default();
// Convert BitSeq back to u128 for verification.
let to_u128 = |seq: &BitSeq| -> u128 {
seq.as_slice().iter().enumerate().fold(0u128, |acc, (i, bit)| {
acc | (u128::from(*bit) << i)
})
};
let s_val = to_u128(&s_bits);
let cout_val = to_u128(&cout_bits);
assert_eq!(5 + 3 + 7, s_val + (cout_val << 1));
Ok(())
}
```
The same approach works for the abstract compressor tree via the
`lower` module:
```rust
use csa_rhdl::prelude::*;
use hdl_cat::sim::interp::interpret;
fn run() -> Result<(), Box<dyn std::error::Error>> {
let tree = compressor_tree(5, 8)?;
let lowered = lower(&tree)?;
let zeros = hdl_cat::kind::BitSeq::from_iter(vec![false; 8]);
let inputs: Vec<hdl_cat::kind::BitSeq> =
lowered.inputs().iter().map(|_| zeros.clone()).collect();
let env = interpret(lowered.graph(), lowered.inputs(), &inputs)?;
let s_wire = lowered.outputs().first().copied()
.ok_or("missing sum wire")?;
let _s = env.get(s_wire.index())
.and_then(Clone::clone)
.unwrap_or_default();
Ok(())
}
```
## Verilog emission
With the `verilog` feature enabled, a lowered graph can be rendered to
Verilog. The emitter returns an `Io` effect; call `.run()` at the
boundary to execute.
```rust
use csa_rhdl::prelude::*;
use hdl_cat::verilog::emitter::emit_graph;
fn run() -> Result<(), Box<dyn std::error::Error>> {
// Build a 9-operand, 16-bit compressor tree and lower to IR.
let tree = compressor_tree(9, 16)?;
let lowered = lower(&tree)?;
// Emit Verilog (delay-run: stays inside Io until .run()).
let verilog_text = emit_graph(
lowered.graph(),
"csa_9to2_w16",
lowered.inputs(),
lowered.outputs(),
)
.flat_map(|module| module.render())
.run()?;
println!("{verilog_text}");
Ok(())
}
```
The generated module has one port per input/output wire. For a
9-operand tree this produces nine input ports and two output ports
(sum and carry), with all internal CSA logic as continuous `assign`
statements.
## Features
- `hdl-cat-gates` (opt-in): hdl-cat circuit definitions, lowering, and
the registered compressor. Depends on
[hdl-cat](https://github.com/MavenRain/hdl-cat).
- `verilog` (opt-in, implies `hdl-cat-gates`): Verilog codegen.
Without any feature flags, the crate provides the full categorical
tree compiler as a pure combinator library.
## License
Dual-licensed under MIT or Apache-2.0, at your option.