baracuda-kernels 0.0.1-alpha.68

Unified ML op facade for the baracuda CUDA ecosystem. Exposes every primitive an ML framework would expect (union of PyTorch torch.* + nn.functional and JAX lax.* / numpy ops) through a single Plan-based Rust surface, internally dispatching to baracuda-cutlass, the baracuda-* NVIDIA-library wrappers, or bespoke baracuda-kernels-sys kernels.
Documentation

baracuda-kernels

There is very little structured metadata to build this page from currently. You should check the main library docs, readme, or Cargo.toml in case the author documented the features in them.

This version has 19 feature flags, 2 of them enabled by default.

default

fa2 (default)

sm80 (default)

awq

bnb_nf4

cudnn

flashinfer

mamba

marlin

megatron_tp

mhc

nvshmem

optim

ozimmu

ring_attention

sm89

sm90a

tensor_engine

xformers_blocksparse

xformers_sparse24