poulpy-hal 0.6.0

# 🐙 Poulpy-HAL

**Poulpy-HAL** is a Rust crate that provides backend-agnostic layouts and trait-based low-level lattice arithmetic. This allows developers to implement lattice-based schemes generically, with the ability to plug in optimized backends (e.g. CPU, GPU, FPGA) at runtime.

The important design point is that the public API is centered on **backend-native borrows** rather than host byte slices. Shared crates should be written against `*ToBackendRef` / `*ToBackendMut` and the corresponding `...BackendRef` / `...BackendMut` view types. This remains true even for host backends: generic HAL-facing code should still go through `ToBackendRef` / `ToBackendMut`, not `to_ref()` / `to_mut()`. Host-view helpers are only escape hatches for explicitly host-side tasks.

## Crate Organization

### **poulpy-hal/layouts**

This module defines backend-agnostic layouts. There are two main categories: user-facing types and backend types. User-facing types, such as `vec_znx`, serve as both inputs and outputs of computations, while backend types, such as `svp_ppol` (a.k.a. scalar vector product prepared polynomial), are pre-processed, write-only types stored in a backend-specific representation for optimized evaluation. For example, in the FFT64 AVX2 CPU implementation, an `svp_ppol` (the prepared form of `scalar_znx`) is stored in the DFT domain with an AVX-optimized data ordering.

This module also provides helpers over these types, as well as serialization for the front-end types `scalar_znx`, `vec_znx` and `mat_znx`.

#### Backend Model

Each backend defines:

- `OwnedBuf`: the backend-owned storage type
- `BufRef<'a>` / `BufMut<'a>`: backend-native shared and mutable borrows

This means a layout like `VecZnx<BE::OwnedBuf>` is the owned form, while:

- `VecZnxBackendRef<'a, BE>` is the shared backend-native borrow
- `VecZnxBackendMut<'a, BE>` is the mutable backend-native borrow

The generic adapter traits follow the same pattern:

- `VecZnxToBackendRef<BE>`
- `VecZnxToBackendMut<BE>`
- `VecZnxDftToBackendRef<BE>`
- `VecZnxDftToBackendMut<BE>`
- `SvpPPolToBackendRef<BE>`
- `SvpPPolToBackendMut<BE>`
- `VmpPMatToBackendRef<BE>`
- `VmpPMatToBackendMut<BE>`
- etc...

Host-visible code should construct `HostBytesBackend` views directly, either through backend-native `*ToBackendRef/*ToBackendMut` impls or the small `*_host_backend_ref/mut` helpers used by shared host utilities. Generic HAL compute code should still be written against backend views, not raw host slices.

#### Core Layouts

- `Module`: stores backend-specific precomputations such as DFT tables and handles.
- `ScalarZnx`: front-end scalar polynomial layout, mainly used for secrets and small plaintexts. Generic code typically consumes it through `ScalarZnxToBackendRef<BE>` / `ScalarZnxToBackendMut<BE>`.
- `VecZnx`: front-end vector-of-polynomials layout used for LWE/GLWE plaintexts and ciphertexts. Precision is represented by limbs in base `2^k`. Generic execution uses `VecZnxBackendRef` / `VecZnxBackendMut` via `VecZnxToBackendRef<BE>` / `VecZnxToBackendMut<BE>`.
- `MatZnx`: front-end matrix-of-polynomials layout, used for GGLWE and GGSW-style objects. Generic backends consume it through `MatZnxToBackendRef<BE>` / `MatZnxToBackendMut<BE>`.
- `VecZnxDft`: backend-specific prepared-domain representation of `VecZnx`. Its storage layout is backend-defined.
- `VecZnxBig`: backend-specific big-coefficient representation, typically used after multiplication or convolution and later normalized back into `VecZnx`.
- `SvpPPol`: backend-specific prepared form of `ScalarZnx` for scalar-vector products.
- `VmpPMat`: backend-specific prepared form of `MatZnx` for vector-matrix products.
- `ScratchArena`: backend-native scratch view over a `ScratchOwned` buffer, used to carve typed temporary storage during execution.

---------

### **poulpy-hal/api**

This module provides the user-facing traits-based API of the hardware acceleration layer. These are the traits used to implement **`poulpy-core`**, **`poulpy-ckks`**, **`poulpy-bin-fhe`**, and any other crate built on Poulpy. These currently include the `module` instantiation, arithmetic over `vec_znx`, `vec_znx_big`, `vec_znx_dft`, `svp_ppol`, `vmp_pmat` and scratch space management.

At this layer, APIs are expected to be backend-generic. In practice that means:

- inputs and outputs are described via `*ToBackendRef` / `*ToBackendMut`
- prepared-domain objects (`VecZnxDft`, `SvpPPol`, `VmpPMat`, convolution prepared types) are treated as opaque backend-owned storage
- host-visible byte access is only required for explicitly host-side operations such as serialization, encoding, stats, or test/reference paths


---------

### **poulpy-hal/oep**

This module provides open extension points that can be implemented to provide a concrete backend to any crate built on **`poulpy-hal/api`** and **`poulpy-hal/layouts`** — including **`poulpy-core`**, **`poulpy-ckks`**, **`poulpy-bin-fhe`**, or any external project. Poulpy-HAL itself is dispatch-only: portable default implementations live in `poulpy-cpu-ref`, and accelerated backends (e.g. `poulpy-cpu-avx`) selectively override hot paths while inheriting everything else.


---------

### **poulpy-hal/delegates**

This module provides a link between the open extension points and public API, forwarding trait calls on `Module<BE>` to the matching per-family OEP trait implemented by `BE` (for example `HalVecZnxImpl<BE>`, `HalVmpImpl<BE>`, or `HalConvolutionImpl<BE>`).


---------

### Pipeline Example

```mermaid
flowchart TD
    A[VecZnx] -->|DFT|B[VecZnxDft]-->E
    C[ScalarZnx] -->|prepare|D[SvpPPol]-->E
    E{SvpApply}-->VecZnxDft-->|IDFT|VecZnxBig-->|Normalize|VecZnx
```

### E2E Dispatch Example

User-facing backend-native call:

```rust,ignore
use poulpy_hal::{
    api::VecZnxAddIntoBackend,
    layouts::{Module, VecZnxBackendMut, VecZnxBackendRef},
};
use poulpy_cpu_avx::FFT64Avx;

let module = Module::<FFT64Avx>::new(1 << 12);
module.vec_znx_add_into_backend(&mut res, 0, &a, 0, &b, 0);
```

Delegate in `poulpy-hal`:

```rust
impl<BE> VecZnxAddIntoBackend<BE> for Module<BE>
where
    BE: Backend + HalVecZnxImpl<BE>,
{
    fn vec_znx_add_into_backend(
        &self,
        res: &mut VecZnxBackendMut<'_, BE>,
        res_col: usize,
        a: &VecZnxBackendRef<'_, BE>,
        a_col: usize,
        b: &VecZnxBackendRef<'_, BE>,
        b_col: usize,
    ) {
        BE::vec_znx_add_into_backend(self, res, res_col, a, a_col, b, b_col)
    }
}
```

Backend implementation (AVX keeps defaults unless it overrides):

```rust
unsafe impl HalVecZnxImpl<FFT64Avx> for FFT64Avx {
    poulpy_cpu_ref::hal_impl_vec_znx!();
}
```

Default in `poulpy-cpu-ref`:

```rust
pub trait HalVecZnxDefault<BE: Backend>: Backend {
    fn vec_znx_add_into_backend_default(
        module: &Module<BE>,
        res: &mut VecZnxBackendMut<'_, BE>,
        res_col: usize,
        a: &VecZnxBackendRef<'_, BE>,
        a_col: usize,
        b: &VecZnxBackendRef<'_, BE>,
        b_col: usize,
    )
    where
        BE: ZnxAdd + ZnxCopy + ZnxZero,
    {
        vec_znx_add_into::<BE>(res, res_col, a, a_col, b, b_col);
    }
}
```

### Host Views vs Backend Views

As a rule of thumb:

- use `*ToBackendRef` / `*ToBackendMut` in public HAL-facing compute APIs, including when the backend itself is host-resident
- treat `to_ref()` / `to_mut()` as host-view escape hatches, not as the normal API for generic backend code

Examples of legitimate host-side use:

- serialization and deserialization
- encoding / decoding helpers
- reference arithmetic that directly manipulates `&[i64]`
- tests that compare host materialized values

Interfacing a device backend with the host should happen through backend transfer hooks such as `from_host_bytes`, `to_host_bytes`, `copy_from_host`, and `copy_to_host`, or through higher-level `upload_*` / `download_*` APIs built on top of them.

Examples of backend-native use:

- `VecZnx -> VecZnxDft`
- `ScalarZnx -> SvpPPol`
- `MatZnx -> VmpPMat`
- pointwise ops in prepared domains
- backend scratch allocation and subview carving

### Backend Interoperability

Backends are also expected to define how values move between host memory and backend-owned storage.

At the raw buffer level, every backend implements:

- `Backend::from_host_bytes`
- `Backend::to_host_bytes`
- `Backend::copy_from_host`
- `Backend::copy_to_host`

These are the fundamental upload/download hooks used to move layout storage across the host/backend boundary. For example:

```rust
let gpu_buf = CudaBackend::from_host_bytes(host_bytes);
let roundtrip = CudaBackend::to_host_bytes(&gpu_buf);
```

For cross-backend buffer transfer, `poulpy-hal` provides `TransferFrom<From>`. This is destination-owned: the destination backend declares how to import a source backend buffer.

```rust
pub trait TransferFrom<From: Backend>: Backend {
    fn transfer_buf(src: &From::OwnedBuf) -> Self::OwnedBuf;
}
```

The default implementation only covers simple host-resident `Vec<u8>` backends. Device backends are expected to add explicit impls for the source backends they support.

At the structured layout level, the canonical `upload_*` / `download_*` APIs live one layer above, in `poulpy-core::api::ModuleTransfer`. Those methods are built on top of `TransferFrom` and let modules move typed values such as `GLWE`, `LWE`, `GGLWE`, `GGSW`, and prepared keys between backends.

In practice:

- use `from_host_bytes` / `to_host_bytes` when you need a low-level buffer bridge
- use `TransferFrom` when implementing backend-to-backend storage movement
- use `ModuleTransfer::upload_*` / `download_*` in higher-level code that moves full typed objects between backends

## Tests

A fully generic cross-backend test suite is available in [`src/test_suite`](./src/test_suite).