atria-gpu-rs 0.1.0

# atria-gpu-rs

CUDA-required GPU build of the **Ablatio Triadum (ATria)** centrality
algorithm, packaged as a PluMA plugin.

[![Rust](https://img.shields.io/badge/rust-2024_edition-orange.svg)](https://www.rust-lang.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE.md)
[![Cargo](https://img.shields.io/crates/v/atria-gpu-rs.svg)](https://crates.io/crates/atria-gpu-rs)

## Relationship to `atria-rs`

The upstream [`atria-rs`](https://github.com/quinnjr/ATria-rs) crate ships
three computational backends — CPU, wgpu, and CUDA — selected at runtime
via `ComputeBackend`. This crate is a thin wrapper that:

1. Depends on `atria-rs` with the `cuda` feature compiled in.
2. Forces `ComputeBackend::Cuda` at construction time and never exposes
   a setter that could switch it back to CPU.
3. Exports a different PluMA-FFI prefix (`ATriaGPU_plugin_*`) so it can
   coexist with the canonical `ATria` plugin in the same PluMA tree
   under `plugins/ATriaGPU/libATriaGPUPlugin.so`.
4. Emits a `log::warn!` if the CUDA runtime is unavailable and the
   underlying `atria-rs` silently degrades to CPU — surfacing the
   fallback rather than letting users be surprised by slower-than-expected
   wall clock.

Algorithm + output semantics are byte-for-byte identical to
`atria-rs >= 1.4.0`. The same `corrP.never.csv` reference run through
ATriaGPU produces output sort-identical to the upstream C++ ATria
plugin's `corrP.never.ATria.noa.expected` reference file, including the
tie-handling at ranks #4 / #4 / #4 / #7.

## Requirements

- An NVIDIA GPU exposed to the host (visible via `nvidia-smi`).
- CUDA toolkit installed and `nvcc` reachable through `$CUDA_PATH` /
  `$CUDA_ROOT`. The cudarc build script auto-detects the toolkit version
  (CUDA 11.4 through 13.x are supported).
- PluMA built with `--with-rust` (see
  [FIUBioRG/PluMA#13](https://github.com/FIUBioRG/PluMA/pull/13)) for the
  loader to be active.

## Build

```bash
git clone https://github.com/quinnjr/atria-gpu-rs.git
cd atria-gpu-rs
CUDA_PATH=/opt/cuda cargo build --release
# → target/release/libATriaGPUPlugin.so
```

## Install into PluMA

```bash
cd /path/to/PluMA/plugins
mkdir -p ATriaGPU
ln -sfn /path/to/atria-gpu-rs/target/release/libATriaGPUPlugin.so \
        ATriaGPU/libATriaGPUPlugin.so
ln -sfn /path/to/atria-gpu-rs/Cargo.toml ATriaGPU/Cargo.toml
```

## Pipeline config

```text
Prefix plugins/ATriaGPU/example/
Plugin ATriaGPU inputfile corrP.never.csv outputfile corrP.never.ATriaGPU.noa
```

## Test against PluMA's harness

```bash
cd /path/to/PluMA
python3 testPluMA.py ATriaGPU
# Testing ATriaGPU...                              [PASS]
# Passing Rate: 100.0%
```

The bundled `example/corrP.never.csv` is the canonical 126-node
oral-microbiome correlation network from the upstream ATria fixture;
`example/corrP.never.ATriaGPU.noa.expected` is the upstream C++ ATria
plugin's actual output. A passing test certifies that the CUDA backend
produces the same ranks (including ties) as the C++ original.

## Performance notes

The Floyd-Warshall step is dominated by `n³` work where `n = 2 * GSIZE`
(GSIZE is the number of nodes; the split-vertex expansion doubles it).
On the reference 126-node network, that's 126·2 = 252-wide matrix —
small enough that PCIe transfer overhead can dominate. CUDA payoff is
expected on larger networks where the `n³` cubic factor dwarfs the
fixed transfer cost.

## License

MIT — see [`LICENSE.md`](LICENSE.md).

## References

- Cickovski, T. et al. (2015, 2017). The Ablatio Triadum (ATria)
  centrality algorithm.
- Upstream Rust implementation: <https://github.com/quinnjr/ATria-rs>.
- Original C++ implementation:
  <https://github.com/movingpictures83/ATria>.