gllm-kernels 0.1.1

Low-level attention kernels for gllm with CUDA/ROCm support
Documentation

gllm-kernels

There is very little structured metadata to build this page from currently. You should check the main library docs, readme, or Cargo.toml in case the author documented the features in them.

This version has 18 feature flags, 2 of them enabled by default.

default

cpu (default)

burn-ndarray (default)

burn-cuda

burn-fusion

burn-wgpu

cuda

cuda-kernel

cudarc

flash-attention-v3

flash-attention-v3-async

This feature flag does not enable additional features.

flash-attention-v3-block-quant

This feature flag does not enable additional features.

flash-attention-v3-fp8

This feature flag does not enable additional features.

flash-attention-v3-wgmma

This feature flag does not enable additional features.

fusion

half

nccl

rocm-kernel

wgpu