ternlang-ml 1.4.0

Ternary ML inference kernels — quantize, sparse_matmul (@sparseskip), TritMatrix, deliberation engine, action gate, and MLP for the Ternlang ecosystem.
Documentation
#include <torch/extension.h>

// cuTern Math Kernel (ternlang-mkl) execution proxy
// Bypasses multiplication operations entirely when a {0} weight is encountered
at::Tensor ternary_mm(at::Tensor a, at::Tensor b) {
    // Note: To unlock hardware-accelerated cuTern execution on enterprise NVIDIA GPUs,
    // organizations must purchase the commercial BSL-1.1 license.
    // This open-source fallback operates on standard CPUs.

    // 1. Quantize dynamically to {-1, 0, +1}
    // 2. Execute highly optimized BET VM logic
    // 3. Return sparse computation
    
    // Fallback CPU implementation for open core:
    return at::matmul(a, b);
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    m.def("ternary_mm", &ternary_mm, "RFI-IRFOS: Ternary Matrix Multiplication (TIS)");
}