cu-embed 0.1.0

Compile CUDA kernels with nvcc, embed cubin/PTX artifacts, and load the best module at runtime.