Skip to main content

Crate baracuda_flashinfer_sys

Crate baracuda_flashinfer_sys 

Source
Expand description

Raw C-ABI FFI surface for the vendored FlashInfer inference kernels.

baracuda-flashinfer wraps this with a safe, typed API. Use this crate directly only if you need a function that the safe layer hasn’t wrapped yet (in which case please file a bug).

FlashInfer (flashinfer-ai/flashinfer, Apache-2.0) is header-only / template-heavy, so there is no shared library to dynamically load. Instead, baracuda compiles thin C-ABI launcher shims around the vendored FlashInfer headers inside [baracuda-kernels-sys]; this crate re-exports those extern "C" symbols under a dedicated crate name so downstream code can depend on the FlashInfer FFI surface without pulling the whole kernels-sys symbol table into scope.

Almost all callers should prefer the safe, typed wrappers in [baracuda-flashinfer] (the sibling crate) over these raw symbols.

§Feature gating

Every symbol is behind the flashinfer cargo feature (OFF by default), which transitively enables baracuda-kernels-sys/flashinfer and compiles the vendored launcher .cu files. With the feature off, this crate is empty.

§Symbol families

  • *_paged_decode_* — batched paged-KV decode (BatchDecodeWithPagedKVCacheDispatched). f16 / bf16 / f32.
  • *_paged_kv_append_decode_* — decode-time KV-cache append.
  • *_merge_state_in_place_* / *_merge_states_* — cascade / prefix-cache LSE-aware attention-state merge.
  • *_top_k_sampling_* / *_top_p_sampling_* / *_min_p_sampling_* / *_top_k_top_p_sampling_* — sort-free sampling from a row-normalized probability tensor.