pub struct KernelDispatcher { /* private fields */ }Expand description
Dispatches kernel calls to the best available implementation.
Uses scirs2_core::simd::detect::CpuFeatures for CPU feature
detection, ensuring consistent SIMD dispatch across the COOLJAPAN ecosystem.
Implementations§
Source§impl KernelDispatcher
impl KernelDispatcher
Sourcepub fn auto_detect() -> Self
pub fn auto_detect() -> Self
Create a dispatcher that auto-detects the best available kernel tier.
Queries SciRS2-Core’s cached CpuFeatures to determine the
optimal tier for the current CPU.
Sourcepub fn with_tier(tier: KernelTier) -> Self
pub fn with_tier(tier: KernelTier) -> Self
Create a dispatcher with a specific tier (for testing/benchmarks).
Sourcepub fn tier(&self) -> KernelTier
pub fn tier(&self) -> KernelTier
Get the selected kernel tier.
Trait Implementations§
Source§impl Debug for KernelDispatcher
impl Debug for KernelDispatcher
Source§impl Fp8Kernel for KernelDispatcher
impl Fp8Kernel for KernelDispatcher
Source§fn dequant_fp8_e4m3(
&self,
blocks: &[BlockFP8E4M3],
output: &mut [f32],
) -> KernelResult<()>
fn dequant_fp8_e4m3( &self, blocks: &[BlockFP8E4M3], output: &mut [f32], ) -> KernelResult<()>
Dequantize FP8 E4M3FN blocks — tier-aware SIMD dispatch.
Source§fn dequant_fp8_e5m2(
&self,
blocks: &[BlockFP8E5M2],
output: &mut [f32],
) -> KernelResult<()>
fn dequant_fp8_e5m2( &self, blocks: &[BlockFP8E5M2], output: &mut [f32], ) -> KernelResult<()>
Dequantize FP8 E5M2 blocks — tier-aware SIMD dispatch.
Source§fn gemv_fp8_e4m3(
&self,
blocks: &[BlockFP8E4M3],
input: &[f32],
output: &mut [f32],
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemv_fp8_e4m3( &self, blocks: &[BlockFP8E4M3], input: &[f32], output: &mut [f32], n_rows: usize, k: usize, ) -> KernelResult<()>
FP8 E4M3FN GEMV — tier-aware SIMD dispatch with optional GPU acceleration.
Dispatch priority on the KernelTier::Gpu path:
- Metal (macOS +
metalfeature) —metal_gemv_fp8_e4m3. - CUDA (Linux/Windows +
native-cudafeature) —cuda_gemv_fp8_e4m3. - CPU SIMD fallback (AVX-512 / AVX2 / NEON / scalar).
The raw-byte cast of blocks to *const u8 is sound because
BlockFP8E4M3 is #[repr(C)] with size BLOCK_FP8_BYTES = 34.
Source§fn gemv_fp8_e5m2(
&self,
blocks: &[BlockFP8E5M2],
input: &[f32],
output: &mut [f32],
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemv_fp8_e5m2( &self, blocks: &[BlockFP8E5M2], input: &[f32], output: &mut [f32], n_rows: usize, k: usize, ) -> KernelResult<()>
FP8 E5M2 GEMV — tier-aware SIMD dispatch with optional GPU acceleration.
Mirrors gemv_fp8_e4m3: Metal → CUDA → CPU SIMD.
The raw-byte cast is sound because BlockFP8E5M2 is #[repr(C)] with
size BLOCK_FP8_BYTES = 34.
Source§fn gemm_fp8_e4m3(
&self,
blocks: &[BlockFP8E4M3],
inputs: &[f32],
outputs: &mut [f32],
n_rows: usize,
k: usize,
batch: usize,
) -> KernelResult<()>
fn gemm_fp8_e4m3( &self, blocks: &[BlockFP8E4M3], inputs: &[f32], outputs: &mut [f32], n_rows: usize, k: usize, batch: usize, ) -> KernelResult<()>
FP8 E4M3FN GEMM — tier-aware SIMD dispatch.
Source§fn gemm_fp8_e5m2(
&self,
blocks: &[BlockFP8E5M2],
inputs: &[f32],
outputs: &mut [f32],
n_rows: usize,
k: usize,
batch: usize,
) -> KernelResult<()>
fn gemm_fp8_e5m2( &self, blocks: &[BlockFP8E5M2], inputs: &[f32], outputs: &mut [f32], n_rows: usize, k: usize, batch: usize, ) -> KernelResult<()>
FP8 E5M2 GEMM — tier-aware SIMD dispatch.
Source§impl OneBitKernel for KernelDispatcher
impl OneBitKernel for KernelDispatcher
Source§fn dequant(
&self,
blocks: &[BlockQ1_0G128],
output: &mut [f32],
) -> KernelResult<()>
fn dequant( &self, blocks: &[BlockQ1_0G128], output: &mut [f32], ) -> KernelResult<()>
Source§fn gemv(
&self,
blocks: &[BlockQ1_0G128],
input: &[f32],
output: &mut [f32],
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemv( &self, blocks: &[BlockQ1_0G128], input: &[f32], output: &mut [f32], n_rows: usize, k: usize, ) -> KernelResult<()>
Source§fn gemm(
&self,
blocks: &[BlockQ1_0G128],
input: &[f32],
output: &mut [f32],
m: usize,
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemm( &self, blocks: &[BlockQ1_0G128], input: &[f32], output: &mut [f32], m: usize, n_rows: usize, k: usize, ) -> KernelResult<()>
Source§fn is_gpu_accelerated(&self) -> bool
fn is_gpu_accelerated(&self) -> bool
Source§fn upload_weights(&self, blocks: &[BlockQ1_0G128]) -> Option<GpuWeightHandle>
fn upload_weights(&self, blocks: &[BlockQ1_0G128]) -> Option<GpuWeightHandle>
Source§fn gemv_cached(
&self,
handle: GpuWeightHandle,
input: &[f32],
output: &mut [f32],
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemv_cached( &self, handle: GpuWeightHandle, input: &[f32], output: &mut [f32], n_rows: usize, k: usize, ) -> KernelResult<()>
Source§fn batch_attn_phase(
&self,
hidden: &[f32],
norm_weight: &[f32],
norm_eps: f32,
qkv_handle: GpuWeightHandle,
q_rows: usize,
k_rows: usize,
h: usize,
) -> KernelResult<Option<(Vec<f32>, Vec<f32>, Vec<f32>)>>
fn batch_attn_phase( &self, hidden: &[f32], norm_weight: &[f32], norm_eps: f32, qkv_handle: GpuWeightHandle, q_rows: usize, k_rows: usize, h: usize, ) -> KernelResult<Option<(Vec<f32>, Vec<f32>, Vec<f32>)>>
Source§fn batch_ffn_phase(
&self,
hidden: &mut [f32],
attn_out: &[f32],
norm_weight: &[f32],
norm_eps: f32,
attn_proj_handle: GpuWeightHandle,
gate_up_handle: GpuWeightHandle,
down_handle: GpuWeightHandle,
h: usize,
intermediate: usize,
attn_proj_k: usize,
) -> KernelResult<bool>
fn batch_ffn_phase( &self, hidden: &mut [f32], attn_out: &[f32], norm_weight: &[f32], norm_eps: f32, attn_proj_handle: GpuWeightHandle, gate_up_handle: GpuWeightHandle, down_handle: GpuWeightHandle, h: usize, intermediate: usize, attn_proj_k: usize, ) -> KernelResult<bool>
Source§impl TernaryKernel for KernelDispatcher
impl TernaryKernel for KernelDispatcher
Source§fn dequant_ternary_g128(
&self,
blocks: &[BlockTQ2_0_g128],
output: &mut [f32],
) -> KernelResult<()>
fn dequant_ternary_g128( &self, blocks: &[BlockTQ2_0_g128], output: &mut [f32], ) -> KernelResult<()>
Source§fn gemv_ternary_g128(
&self,
blocks: &[BlockTQ2_0_g128],
input: &[f32],
output: &mut [f32],
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemv_ternary_g128( &self, blocks: &[BlockTQ2_0_g128], input: &[f32], output: &mut [f32], n_rows: usize, k: usize, ) -> KernelResult<()>
Source§fn gemm_ternary_g128(
&self,
blocks: &[BlockTQ2_0_g128],
input: &[f32],
output: &mut [f32],
m: usize,
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemm_ternary_g128( &self, blocks: &[BlockTQ2_0_g128], input: &[f32], output: &mut [f32], m: usize, n_rows: usize, k: usize, ) -> KernelResult<()>
Source§fn upload_weights_ternary(
&self,
blocks: &[BlockTQ2_0_g128],
) -> Option<GpuWeightHandle>
fn upload_weights_ternary( &self, blocks: &[BlockTQ2_0_g128], ) -> Option<GpuWeightHandle>
Source§fn gemv_ternary_g128_cached(
&self,
handle: GpuWeightHandle,
input: &[f32],
output: &mut [f32],
n_rows: usize,
k: usize,
) -> KernelResult<()>
fn gemv_ternary_g128_cached( &self, handle: GpuWeightHandle, input: &[f32], output: &mut [f32], n_rows: usize, k: usize, ) -> KernelResult<()>
Auto Trait Implementations§
impl Freeze for KernelDispatcher
impl RefUnwindSafe for KernelDispatcher
impl Send for KernelDispatcher
impl Sync for KernelDispatcher
impl Unpin for KernelDispatcher
impl UnsafeUnpin for KernelDispatcher
impl UnwindSafe for KernelDispatcher
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more