Raw FFI + dynamic loader skeleton for NVIDIA cuTENSOR.
cuTENSOR is a separately-installed NVIDIA library for high-performance tensor contraction, reduction, and element-wise ops. v0.1 ships the loader + status enum; concrete contraction/permutation/reduction wrappers follow once CI has a cuTENSOR install.