Expand description
Raw C-ABI FFI surface for the vendored FlashInfer inference kernels.
baracuda-flashinfer wraps this with a safe, typed API. Use this
crate directly only if you need a function that the safe layer
hasn’t wrapped yet (in which case please file a bug).
FlashInfer (flashinfer-ai/flashinfer, Apache-2.0) is header-only /
template-heavy, so there is no shared library to dynamically load.
Instead, baracuda compiles thin C-ABI launcher shims around the
vendored FlashInfer headers inside [baracuda-kernels-sys]; this
crate re-exports those extern "C" symbols under a dedicated crate
name so downstream code can depend on the FlashInfer FFI surface
without pulling the whole kernels-sys symbol table into scope.
Almost all callers should prefer the safe, typed wrappers in
[baracuda-flashinfer] (the sibling crate) over these raw symbols.
§Feature gating
Every symbol is behind the flashinfer cargo feature (OFF by
default), which transitively enables baracuda-kernels-sys/flashinfer
and compiles the vendored launcher .cu files. With the feature
off, this crate is empty.
§Symbol families
*_paged_decode_*— batched paged-KV decode (BatchDecodeWithPagedKVCacheDispatched). f16 / bf16 / f32.*_paged_kv_append_decode_*— decode-time KV-cache append.*_merge_state_in_place_*/*_merge_states_*— cascade / prefix-cache LSE-aware attention-state merge.*_top_k_sampling_*/*_top_p_sampling_*/*_min_p_sampling_*/*_top_k_top_p_sampling_*— sort-free sampling from a row-normalized probability tensor.