Crate dsfb_gpu_debug_cuda

Expand description

CUDA dispatch and FFI bridge for dsfb-gpu-debug — the CUDA Evidence Factory for the DSFB-GPU densorial / tekmeric inference stack.

Front-door identity (panel-locked):

DSFB-GPU is a CUDA-accelerated deterministic evidence court: byte-exact witness-family kernels shade residual densors into canonical evidence bytes, then a CPU-side jurisprudence layer admits, challenges, contraindicates, and records them into replayable case files.

This crate is the GPU half of that posture. It is not a neural inference backend. It executes deterministic witness-family kernels over residual densors and returns canonical witness bytes, candidate summaries, and stage digests to the CPU-side court (the dsfb-gpu-debug-core and dsfb-gpu-atlas-corpus crates).

The GPU has no semantic authority. It does not admit episodes or assign final meaning. It produces byte-exact evidence under a declared numeric, boundary, reduction, parameter, and hashing contract.

residual densors
  → CUDA deterministic witness families
  → witness densors / candidate summaries / stage digests
  → CPU court admission
  → replayable case file

Panel-locked non-claims: this crate does NOT claim peak memory-bandwidth saturation, optimal kernel occupancy, optimal multi-GPU scaling, or production CUDA performance. Those are explicit future performance-campaign targets. Current artifacts establish the deterministic-evidence- factory shape, the byte-exact CPU/GPU equivalence, and the Semantic Non-Bypass Axiom (the bank stage stays CPU- side; episodes only enter the case file through the bank-private admission token).

Build behavior is split by the cuda feature flag:

Without cuda: the crate compiles to a thin shim. Every public entry point returns GpuError::CudaUnavailable. This lets the rest of the workspace build and run on hosts without nvcc installed, which is important because the CPU reference path is the v0 reproducibility target — the GPU path is a parity check.
With cuda: build.rs invokes nvcc to compile the kernels under cuda/kernels.cu into a static archive, links it, and the dispatch module wires Rust callers to the C ABI functions exposed in ffi.rs.

unsafe is restricted to the CUDA FFI boundary and the resource-lifetime wrappers that own raw device handles: the kernel-dispatch calls in dispatch, the workspace allocate / free / readback paths in workspace, the cudaMallocHost / cudaFreeHost owning handle in pinned, and the stream / graph handle plumbing the throughput path needs. All semantic code (case-file construction, hash chain, bank admission) remains safe Rust. The unsafe surface is small, auditable, and never reaches across module boundaries; each unsafe block sits next to a comment naming the invariant it carries.

Enums§

GpuError: Errors that the GPU pipeline can surface.

Functions§

build_gpu: Stub for non-CUDA builds. Always returns GpuError::CudaUnavailable.
pipeline_available: Placeholder entry point retained for the workspace smoke build.