#[non_exhaustive]#[repr(u16)]pub enum ShapeLayoutKind {
Pad = 0,
Concat = 1,
Permute = 2,
Repeat = 3,
Flip = 4,
Roll = 5,
Meshgrid = 6,
Fill = 7,
WriteSlice = 8,
Contiguize = 9,
Triu = 10,
Tril = 11,
}Expand description
Shape / layout op discriminant — Category N.
Tags the kernel SKU for telemetry / autotuner-cache keys. Each
variant has its own Plan type today (PadPlan, ConcatPlan, …)
because their descriptor / args shapes differ enough that one
ShapeLayoutPlan<T, N> doesn’t fit. The enum exists so all of
them populate KernelSku::op from a shared discriminant space.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Pad = 0
F.pad(x, pad, mode='constant', value=v) — Phase 3 trailblazer.
Concat = 1
torch.cat(tensors, dim) — variable-arity input. Reserved.
Permute = 2
Materialized torch.permute(x, dims) (strided-view materialization
when needed). Reserved.
Repeat = 3
x.repeat(...) / torch.tile(x, ...). Reserved.
Flip = 4
torch.flip(x, dims) — reverse along axes. Reserved.
Roll = 5
torch.roll(x, shifts, dims) — shift along axes. Reserved.
Meshgrid = 6
torch.meshgrid(*tensors) — N rank-1 → N rank-N. Reserved.
Fill = 7
torch.full(shape, value) / Tensor.fill_(value) — fill every
element of an output tensor with a scalar constant. Wired from
fuel-cuda-kernels/fill.cu.
WriteSlice = 8
dest[start_0..end_0, ..., start_{N-1}..end_{N-1}] = source
(assign, not accumulate). Per-axis range write. Phase 13.1
trailblazer — driven by Fuel team’s persistent KV-cache append
(autoregressive decoding). See
baracuda_kernels::WriteSlicePlan.
Contiguize = 9
Strided→contiguous materialization (torch.Tensor.contiguous).
Phase 13.2: closes the D2H→CPU contiguize→H2D fallback cliff
for non-contiguous CUDA inputs. Byte-level dtype-agnostic
(sizeof-templated kernel) covering every byte-aligned dtype;
nibble (S4 / U4) shipped behind a documented innermost-stride
constraint. See baracuda_kernels::ContiguizePlan.
Triu = 10
torch.triu(input, diagonal) — keep upper triangular part of
the last two dims of input; zero everything below the
diagonal-th diagonal. Batch dims (anything before the last
two) are independently masked. Phase 13.4 trailblazer — driven
by Fuel team’s CPU-only triu/tril gap. See
baracuda_kernels::TriuPlan.
Tril = 11
torch.tril(input, diagonal) — keep lower triangular part of
the last two dims of input; zero everything above the
diagonal-th diagonal. Sibling of Self::Triu with the
predicate flipped. See baracuda_kernels::TrilPlan.
Trait Implementations§
Source§impl Clone for ShapeLayoutKind
impl Clone for ShapeLayoutKind
Source§fn clone(&self) -> ShapeLayoutKind
fn clone(&self) -> ShapeLayoutKind
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for ShapeLayoutKind
Source§impl Debug for ShapeLayoutKind
impl Debug for ShapeLayoutKind
impl Eq for ShapeLayoutKind
Source§impl Hash for ShapeLayoutKind
impl Hash for ShapeLayoutKind
Source§impl PartialEq for ShapeLayoutKind
impl PartialEq for ShapeLayoutKind
Source§fn eq(&self, other: &ShapeLayoutKind) -> bool
fn eq(&self, other: &ShapeLayoutKind) -> bool
self and other values to be equal, and is used by ==.