Struct CpuRuntime

Source

pub struct CpuRuntime;

Expand description

CPU compute runtime

This is the default runtime that works on any platform. Memory is allocated on the heap using the system allocator.

Trait Implementations§

Source §

impl ActivationOps<CpuRuntime> for CpuClient

ActivationOps implementation for CPU runtime.

Source §

fn relu(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Rectified linear unit: max(0, a)

Source §

fn sigmoid(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Sigmoid: 1 / (1 + e^(-a))

Source §

fn silu(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

SiLU (Swish): a * sigmoid(a) = a / (1 + e^(-a)) Read more

Source §

fn gelu(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

GELU (Gaussian Error Linear Unit): 0.5 * a * (1 + tanh(sqrt(2/pi) * (a + 0.044715 * a^3))) Read more

Source §

fn silu_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused SiLU-Mul: silu(a) * b in a single pass. Read more

Source §

fn gelu_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused GELU-Mul: gelu(a) * b in a single pass. Read more

Source §

fn relu_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused ReLU-Mul: relu(a) * b in a single pass. Read more

Source §

fn sigmoid_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused Sigmoid-Mul: sigmoid(a) * b in a single pass. Read more

Source §

fn silu_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Fused SiLU-Mul backward: computes gradients for output = silu(a) * b. Read more

Source §

fn gelu_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Fused GELU-Mul backward: computes gradients for output = gelu(a) * b. Read more

Source §

fn relu_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Fused ReLU-Mul backward: computes gradients for output = relu(a) * b. Read more

Source §

fn sigmoid_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Fused Sigmoid-Mul backward: computes gradients for output = sigmoid(a) * b. Read more

Source §

fn leaky_relu( &self, a: &Tensor<CpuRuntime>, negative_slope: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Leaky ReLU: max(negative_slope * a, a) Read more

Source §

fn elu( &self, a: &Tensor<CpuRuntime>, alpha: f64, ) -> Result<Tensor<CpuRuntime>, Error>

ELU (Exponential Linear Unit): a if a > 0, else alpha * (exp(a) - 1) Read more

Source §

fn softmax( &self, a: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Softmax along a dimension

Source §

fn softmax_bwd( &self, grad: &Tensor<CpuRuntime>, output: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Softmax backward pass: computes gradient w.r.t. input given output gradient and softmax output. Read more

Source §

fn softmax_with_bias( &self, a: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Fused softmax with additive bias: softmax(a + bias, dim) in a single memory pass. Read more

Source §

fn softplus(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Softplus: log(1 + exp(a)) Read more

Source §

fn log_softmax( &self, a: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Log-softmax along a dimension: log(softmax(x, dim)) Read more

Source §

fn dropout( &self, a: &Tensor<CpuRuntime>, p: f64, training: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Dropout: randomly zero elements with probability p during training. Read more

Source §

impl AdvancedRandomOps<CpuRuntime> for CpuClient

AdvancedRandomOps implementation for CPU runtime.

Source §

fn philox_randn( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate N(0,1) samples using Philox4x32-10. Args: shape, key (seed), counter, dtype (F32/F64; WebGPU: F32 only)

Source §

fn philox_uniform( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate uniform [0,1) samples using Philox4x32-10.

Source §

fn threefry_randn( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate N(0,1) samples using ThreeFry4x64-20 (cryptographic quality).

Source §

fn threefry_uniform( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate uniform [0,1) samples using ThreeFry4x64-20.

Source §

fn pcg64_randn( &self, shape: &[usize], seed: u64, stream: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate N(0,1) samples using PCG64. Args: shape, seed, stream (for parallel generation), dtype

Source §

fn pcg64_uniform( &self, shape: &[usize], seed: u64, stream: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate uniform [0,1) samples using PCG64.

Source §

fn xoshiro256_randn( &self, shape: &[usize], seed: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate N(0,1) samples using Xoshiro256++. Args: shape, seed, dtype

Source §

fn xoshiro256_uniform( &self, shape: &[usize], seed: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate uniform [0,1) samples using Xoshiro256++.

Source §

impl AlibiOps<CpuRuntime> for CpuClient

Source §

fn alibi_add_bias( &self, scores: &Tensor<CpuRuntime>, batch_size: usize, num_heads: usize, seq_len_q: usize, seq_len_k: usize, ) -> Result<()>

Add ALiBi bias to attention scores in-place

Source §

fn alibi_add_bias_causal( &self, scores: &Tensor<CpuRuntime>, batch_size: usize, num_heads: usize, seq_len_q: usize, seq_len_k: usize, position: usize, ) -> Result<()>

Add ALiBi bias + causal mask to attention scores in-place. Read more

Source §

impl AttentionOps<CpuRuntime> for CpuClient

Source §

fn multi_head_attention( &self, q: &Var<CpuRuntime>, k: &Var<CpuRuntime>, v: &Var<CpuRuntime>, mask: Option<&Var<CpuRuntime>>, num_heads: usize, ) -> Result<Var<CpuRuntime>>

Source §

impl BinaryOps<CpuRuntime> for CpuClient

BinaryOps implementation for CPU runtime.

Source §

fn add( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise addition: a + b Read more

Source §

fn sub( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise subtraction: a - b Read more

Source §

fn mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise multiplication: a * b Read more

Source §

fn div( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise division: a / b Read more

Source §

fn pow( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise power: a^b Read more

Source §

fn maximum( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise maximum: max(a, b) Read more

Source §

fn minimum( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise minimum: min(a, b) Read more

Source §

fn atan2( &self, y: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Two-argument arctangent: atan2(y, x) Read more

Source §

fn fused_mul_add( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, c: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused multiply-add: a * b + c Read more

Source §

fn fused_add_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, c: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused add-multiply: (a + b) * c Read more

Source §

fn add_into( &self, out: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(), Error>

Element-wise addition into a pre-allocated destination: out = a + b. Read more

Source §

impl CalibrationOps<CpuRuntime> for CpuClient

Source §

fn awq_channel_scores( &self, activations: &Tensor<CpuRuntime>, weights: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

AWQ channel importance scoring. Read more

Source §

fn fisher_information( &self, gradients: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Diagonal Fisher Information Matrix. Read more

Source §

fn gptq_hessian_update( &self, hessian: &Tensor<CpuRuntime>, x_block: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

GPTQ Hessian accumulation. Read more

Source §

fn gptq_quantize_column( &self, weight: &Tensor<CpuRuntime>, h_inv: &Tensor<CpuRuntime>, num_bits: u32, group_size: u32, symmetric: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

GPTQ column-wise quantization with error compensation. Read more

Source §

impl Clone for CpuRuntime

Source §

fn clone(&self) -> CpuRuntime

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl CompareOps<CpuRuntime> for CpuClient

Source §

fn eq( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise equality: a == b

Source §

fn ne( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise inequality: a != b

Source §

fn lt( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise less than: a < b

Source §

fn le( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise less than or equal: a <= b

Source §

fn gt( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise greater than: a > b

Source §

fn ge( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise greater than or equal: a >= b

Source §

impl ComplexOps<CpuRuntime> for CpuClient

ComplexOps implementation for CPU runtime.

Source §

fn conj(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Complex conjugate: conj(a + bi) = a - bi Read more

Source §

fn real(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Extract real part of complex tensor: real(a + bi) = a Read more

Source §

fn imag(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Extract imaginary part of complex tensor: imag(a + bi) = b Read more

Source §

fn angle(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute phase angle of complex tensor: angle(a + bi) = atan2(b, a) Read more

Source §

fn make_complex( &self, real: &Tensor<CpuRuntime>, imag: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Construct complex tensor from separate real and imaginary part tensors. Read more

Source §

fn complex_mul_real( &self, complex: &Tensor<CpuRuntime>, real: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Multiply complex tensor by real tensor element-wise. Read more

Source §

fn complex_div_real( &self, complex: &Tensor<CpuRuntime>, real: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Divide complex tensor by real tensor element-wise. Read more

Source §

fn real_mul_complex( &self, real: &Tensor<R>, complex: &Tensor<R>, ) -> Result<Tensor<R>, Error>

Multiply real tensor by complex tensor element-wise. Read more

Source §

impl ConditionalOps<CpuRuntime> for CpuClient

ConditionalOps implementation for CPU runtime.

Source §

fn where_cond( &self, cond: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, y: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Conditional select: where(cond, x, y) = cond ? x : y Read more

Source §

impl ConvOps<CpuRuntime> for CpuClient

Source §

fn conv1d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: usize, padding: PaddingMode, dilation: usize, groups: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Applies a 1D convolution over an input signal. Read more

Source §

fn conv2d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: (usize, usize), padding: PaddingMode, dilation: (usize, usize), groups: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Applies a 2D convolution over an input image. Read more

Source §

fn depthwise_conv2d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: (usize, usize), padding: PaddingMode, dilation: (usize, usize), ) -> Result<Tensor<CpuRuntime>, Error>

Applies a depthwise separable 2D convolution. Read more

Source §

fn conv_transpose1d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: usize, padding: PaddingMode, output_padding: usize, dilation: usize, groups: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Applies a 1D transposed convolution (also called “deconvolution”) over an input signal. Read more

Source §

impl CumulativeOps<CpuRuntime> for CpuClient

CumulativeOps implementation for CPU runtime.

Source §

fn cumsum( &self, a: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Cumulative sum along a dimension Read more

Source §

fn cumprod( &self, a: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Cumulative product along a dimension Read more

Source §

fn logsumexp( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Log-sum-exp along specified dimensions (numerically stable) Read more

Source §

impl Debug for CpuRuntime

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more

Source §

impl Default for CpuRuntime

Source §

fn default() -> CpuRuntime

Returns the “default value” for a type. Read more

Source §

impl DequantOps<CpuRuntime> for CpuClient

Source §

fn nf4_dequant( &self, nf4_data: &Tensor<CpuRuntime>, absmax: &Tensor<CpuRuntime>, blocksize: usize, ) -> Result<Tensor<CpuRuntime>>

NF4 dequantization: nf4_data + absmax → float tensor Read more

Source §

fn nf4_gemm( &self, input: &Tensor<CpuRuntime>, nf4_weight: &Tensor<CpuRuntime>, absmax: &Tensor<CpuRuntime>, n_out: usize, k: usize, blocksize: usize, ) -> Result<Tensor<CpuRuntime>>

NF4 fused GEMM: input × nf4_weight without materializing dequantized weight Read more

Source §

fn dequantize( &self, qt: &QuantTensor<CpuRuntime>, target_dtype: DType, ) -> Result<Tensor<CpuRuntime>>

Dequantize to standard tensor with the specified dtype Read more

Source §

impl DistanceOps<CpuRuntime> for CpuClient

Source §

fn cdist( &self, x: &Tensor<CpuRuntime>, y: &Tensor<CpuRuntime>, metric: DistanceMetric, ) -> Result<Tensor<CpuRuntime>, Error>

Compute pairwise distances between two point sets. Read more

Source §

fn pdist( &self, x: &Tensor<CpuRuntime>, metric: DistanceMetric, ) -> Result<Tensor<CpuRuntime>, Error>

Compute pairwise distances within a single point set (condensed form). Read more

Source §

fn squareform( &self, condensed: &Tensor<CpuRuntime>, n: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Convert condensed distance vector to square distance matrix. Read more

Source §

fn squareform_inverse( &self, square: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Convert square distance matrix to condensed form. Read more

Source §

impl EinsumOps<CpuRuntime> for CpuClient

Source §

fn einsum( &self, notation: &str, inputs: &[&Tensor<CpuRuntime>], ) -> Result<Tensor<CpuRuntime>, Error>

Evaluate an einsum expression. Read more

Source §

impl FftAlgorithms<CpuRuntime> for CpuClient

Source §

fn fft( &self, input: &Tensor<CpuRuntime>, direction: FftDirection, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

1D FFT on complex input using Stockham autosort algorithm Read more

Source §

fn fft_dim( &self, input: &Tensor<CpuRuntime>, dim: isize, direction: FftDirection, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

1D FFT along a specific dimension Read more

Source §

fn rfft( &self, input: &Tensor<CpuRuntime>, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

Real FFT: Real input → Complex output Read more

Source §

fn irfft( &self, input: &Tensor<CpuRuntime>, n: Option<usize>, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

Inverse real FFT: Complex input → Real output Read more

Source §

fn fft2( &self, input: &Tensor<CpuRuntime>, direction: FftDirection, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

2D FFT on complex input Read more

Source §

fn rfft2( &self, input: &Tensor<CpuRuntime>, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

2D Real FFT Read more

Source §

fn irfft2( &self, input: &Tensor<CpuRuntime>, s: Option<(usize, usize)>, norm: FftNormalization, ) -> Result<Tensor<CpuRuntime>, Error>

Inverse 2D Real FFT Read more

Source §

fn fftshift( &self, input: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Frequency shift: shift zero-frequency component to center Read more

Source §

fn ifftshift( &self, input: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Inverse frequency shift: undo fftshift

Source §

fn fftfreq( &self, n: usize, d: f64, dtype: DType, device: &<CpuRuntime as Runtime>::Device, ) -> Result<Tensor<CpuRuntime>, Error>

Generate FFT sample frequencies Read more

Source §

fn rfftfreq( &self, n: usize, d: f64, dtype: DType, device: &<CpuRuntime as Runtime>::Device, ) -> Result<Tensor<CpuRuntime>, Error>

Generate non-negative FFT sample frequencies for rfft Read more

Source §

impl FlashAttentionOps<CpuRuntime> for CpuClient

Source §

fn flash_attention_fwd( &self, q: &Tensor<CpuRuntime>, k: &Tensor<CpuRuntime>, v: &Tensor<CpuRuntime>, num_heads: usize, num_kv_heads: usize, head_dim: usize, causal: bool, window_size: usize, kv_seq_len: Option<usize>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Flash Attention forward pass (standard dtypes: F32, F16, BF16) Read more

Source §

fn flash_attention_fwd_fp8( &self, _q: &Tensor<CpuRuntime>, _k: &Tensor<CpuRuntime>, _v: &Tensor<CpuRuntime>, _num_heads: usize, _num_kv_heads: usize, _head_dim: usize, _causal: bool, _q_scale: f32, _k_scale: f32, _v_scale: f32, _o_scale: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Flash Attention forward pass for FP8 tensors Read more

Source §

fn flash_attention_bwd( &self, dout: &Tensor<CpuRuntime>, q: &Tensor<CpuRuntime>, k: &Tensor<CpuRuntime>, v: &Tensor<CpuRuntime>, output: &Tensor<CpuRuntime>, _lse: &Tensor<CpuRuntime>, num_heads: usize, num_kv_heads: usize, _head_dim: usize, causal: bool, window_size: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Flash Attention backward pass Read more

Source §

fn flash_attention_bwd_fp8( &self, _dout: &Tensor<CpuRuntime>, _q: &Tensor<CpuRuntime>, _k: &Tensor<CpuRuntime>, _v: &Tensor<CpuRuntime>, _output: &Tensor<CpuRuntime>, _lse: &Tensor<CpuRuntime>, _num_heads: usize, _num_kv_heads: usize, _head_dim: usize, _causal: bool, _q_scale: f32, _k_scale: f32, _v_scale: f32, _do_scale: f32, _o_scale: f32, _dq_scale: f32, _dk_scale: f32, _dv_scale: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Flash Attention backward pass for FP8 tensors Read more

Source §

impl FusedFp8TrainingOps<CpuRuntime> for CpuClient

Source §

fn fused_grad_unscale_clip( &self, grad: &Tensor<CpuRuntime>, max_norm: f64, loss_scale: f64, ) -> Result<(Tensor<CpuRuntime>, f64, bool)>

Fused gradient unscale + clip + inf/nan detect. Read more

Source §

fn dynamic_loss_scale_update( &self, found_inf: bool, loss_scale: f64, growth_tracker: i32, growth_interval: i32, backoff_factor: f64, ) -> Result<(f64, i32)>

Update dynamic loss scale based on inf/nan history. Read more

Source §

impl FusedOptimizerOps<CpuRuntime> for CpuClient

Source §

fn fused_adamw_step( &self, param: &Tensor<CpuRuntime>, grad: &Tensor<CpuRuntime>, m: &Tensor<CpuRuntime>, v: &Tensor<CpuRuntime>, lr: f64, beta1: f64, beta2: f64, eps: f64, wd: f64, step_size: f64, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Fused AdamW step: update param, m, v in a single pass. Read more

Source §

fn fused_sgd_step( &self, param: &Tensor<CpuRuntime>, grad: &Tensor<CpuRuntime>, momentum_buf: Option<&Tensor<CpuRuntime>>, lr: f64, momentum: f64, dampening: f64, wd: f64, nesterov: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Fused SGD step with optional momentum. Read more

Source §

fn fused_adagrad_step( &self, param: &Tensor<CpuRuntime>, grad: &Tensor<CpuRuntime>, accum: &Tensor<CpuRuntime>, lr: f64, eps: f64, wd: f64, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Fused AdaGrad step: update param and accumulator in a single pass. Read more

Source §

fn fused_lamb_step( &self, param: &Tensor<CpuRuntime>, grad: &Tensor<CpuRuntime>, m: &Tensor<CpuRuntime>, v: &Tensor<CpuRuntime>, beta1: f64, beta2: f64, eps: f64, wd: f64, bias_corr1: f64, bias_corr2: f64, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Fused LAMB step: update param, m, v in a single pass. Read more

Source §

fn fused_multi_tensor_adamw( &self, groups: &[(&Tensor<CpuRuntime>, &Tensor<CpuRuntime>, &Tensor<CpuRuntime>, &Tensor<CpuRuntime>)], lr: f64, beta1: f64, beta2: f64, eps: f64, wd: f64, step_size: f64, ) -> Result<Vec<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>>

Fused multi-tensor AdamW: update ALL parameter groups in a single kernel launch. Read more

Source §

impl FusedQkvOps<CpuRuntime> for CpuClient

Source §

fn fused_qkv_projection( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, num_heads: usize, num_kv_heads: usize, head_dim: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Fused Q/K/V projection: single matmul + split into Q, K, V Read more

Source §

fn fused_output_projection_residual( &self, attn_out: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, residual: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Fused output projection with residual addition Read more

Source §

fn fused_qkv_projection_bwd( &self, dq: &Tensor<CpuRuntime>, dk: &Tensor<CpuRuntime>, dv: &Tensor<CpuRuntime>, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, has_bias: bool, num_heads: usize, num_kv_heads: usize, head_dim: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Option<Tensor<CpuRuntime>>)>

Backward pass for fused QKV projection Read more

Source §

fn fused_output_projection_residual_bwd( &self, d_output: &Tensor<CpuRuntime>, attn_out: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, has_bias: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Option<Tensor<CpuRuntime>>, Tensor<CpuRuntime>)>

Backward pass for fused output projection with residual Read more

Source §

impl FusedQuantOps<CpuRuntime> for CpuClient

Source §

fn fused_int4_swiglu( &self, input: &Tensor<CpuRuntime>, gate_qweight: &Tensor<CpuRuntime>, gate_scales: &Tensor<CpuRuntime>, gate_zeros: &Tensor<CpuRuntime>, up_qweight: &Tensor<CpuRuntime>, up_scales: &Tensor<CpuRuntime>, up_zeros: &Tensor<CpuRuntime>, group_size: usize, ) -> Result<Tensor<CpuRuntime>>

Fused INT4 dual-GEMM + SwiGLU: silu(input @ gate_w) * (input @ up_w) Read more

Source §

fn fused_int4_qkv( &self, input: &Tensor<CpuRuntime>, qweight_q: &Tensor<CpuRuntime>, scales_q: &Tensor<CpuRuntime>, zeros_q: &Tensor<CpuRuntime>, qweight_k: &Tensor<CpuRuntime>, scales_k: &Tensor<CpuRuntime>, zeros_k: &Tensor<CpuRuntime>, qweight_v: &Tensor<CpuRuntime>, scales_v: &Tensor<CpuRuntime>, zeros_v: &Tensor<CpuRuntime>, group_size: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Fused INT4 triple-GEMM QKV projection: (input@Wq, input@Wk, input@Wv) Read more

Source §

impl GemmEpilogueOps<CpuRuntime> for CpuClient

Source §

fn matmul_bias_activation( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, activation: GemmActivation, ) -> Result<Tensor<CpuRuntime>, Error>

Fused GEMM + bias + activation: activation(A @ B + bias) Read more

Source §

fn matmul_bias_residual( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, residual: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused GEMM + bias + residual: A @ B + bias + residual Read more

Source §

fn matmul_bias_activation_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, activation: GemmActivation, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Backward pass for fused GEMM + bias + activation. Read more

Source §

impl GrammarDfaOps<CpuRuntime> for CpuClient

Source §

fn grammar_dfa_mask_logits( &self, logits: &Tensor<CpuRuntime>, grammar: &DeviceGrammarDfa<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Apply grammar DFA mask to logits tensor in-place. Read more

Source §

impl IndexingOps<CpuRuntime> for CpuClient

IndexingOps implementation for CPU runtime.

Source §

fn argmax( &self, a: &Tensor<CpuRuntime>, dim: usize, keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Argmax: returns indices of maximum values along a dimension. Read more

Source §

fn argmin( &self, a: &Tensor<CpuRuntime>, dim: usize, keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Argmin: returns indices of minimum values along a dimension. Read more

Source §

fn gather( &self, a: &Tensor<CpuRuntime>, dim: usize, index: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Gather elements along a dimension using an index tensor. Read more

Source §

fn scatter( &self, a: &Tensor<CpuRuntime>, dim: usize, index: &Tensor<CpuRuntime>, src: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Scatter values into a tensor at positions specified by an index tensor. Read more

Source §

fn index_select( &self, a: &Tensor<CpuRuntime>, dim: usize, index: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Select elements along a dimension using a 1D index tensor. Read more

Source §

fn index_put( &self, a: &Tensor<CpuRuntime>, dim: usize, index: &Tensor<CpuRuntime>, src: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Put values at specified indices along a dimension. Read more

Source §

fn masked_select( &self, a: &Tensor<CpuRuntime>, mask: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Select elements where mask is true, returning a flattened 1D tensor. Read more

Source §

fn masked_fill( &self, a: &Tensor<CpuRuntime>, mask: &Tensor<CpuRuntime>, value: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Fill elements where mask is true with a scalar value. Read more

Source §

fn embedding_lookup( &self, embeddings: &Tensor<CpuRuntime>, indices: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Look up embeddings from an embedding table using indices. Read more

Source §

fn scatter_reduce( &self, dst: &Tensor<CpuRuntime>, dim: usize, index: &Tensor<CpuRuntime>, src: &Tensor<CpuRuntime>, op: ScatterReduceOp, include_self: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Scatter values with reduction into a destination tensor. Read more

Source §

fn gather_nd( &self, input: &Tensor<CpuRuntime>, indices: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Gather elements using N-dimensional indices. Read more

Source §

fn bincount( &self, input: &Tensor<CpuRuntime>, weights: Option<&Tensor<CpuRuntime>>, minlength: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Count occurrences of each value in an integer tensor. Read more

Source §

fn gather_2d( &self, input: &Tensor<CpuRuntime>, rows: &Tensor<CpuRuntime>, cols: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Gather elements from a 2D matrix using row and column index vectors. Read more

Source §

fn slice_assign( &self, dst: &Tensor<CpuRuntime>, src: &Tensor<CpuRuntime>, dim: usize, start: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Assign src into a slice of dst along dimension dim starting at start. Read more

Source §

fn take( &self, tensor: &Tensor<R>, indices: &Tensor<R>, ) -> Result<Tensor<R>, Error>
where R: Runtime<DType = DType>,

Take values from a tensor using flat indices. Read more

Source §

fn put( &self, tensor: &Tensor<R>, indices: &Tensor<R>, values: &Tensor<R>, ) -> Result<Tensor<R>, Error>
where R: Runtime<DType = DType>,

Put values into a tensor at flat indices (functional, non-mutating). Read more

Source §

impl Kernel<CpuRuntime> for CpuClient

Source §

unsafe fn binary_op<T>( &self, op: BinaryOp, a: const T, b: const T, out: *mut T, len: usize, )
where T: Element,

Element-wise binary operation Read more

Source §

unsafe fn unary_op<T>(&self, op: UnaryOp, a: const T, out: mut T, len: usize)
where T: Element,

Element-wise unary operation Read more

Source §

unsafe fn matmul<T>( &self, a: const T, b: const T, out: *mut T, m: usize, n: usize, k: usize, lda: usize, ldb: usize, ldc: usize, )
where T: Element,

Matrix multiplication: C = A @ B Read more

Source §

unsafe fn reduce<T>( &self, op: ReduceOp, a: const T, out: mut T, reduce_size: usize, outer_size: usize, )
where T: Element,

Reduction along contiguous dimension Read more

Source §

unsafe fn fill<T>(&self, out: *mut T, value: T, len: usize)
where T: Element,

Fill buffer with a constant value Read more

Source §

unsafe fn copy<T>(&self, src: const T, dst: mut T, len: usize)
where T: Element,

Copy elements from src to dst Read more

Source §

impl KvCacheOps<CpuRuntime> for CpuClient

Source §

fn kv_cache_update( &self, k_cache: &Tensor<CpuRuntime>, v_cache: &Tensor<CpuRuntime>, new_k: &Tensor<CpuRuntime>, new_v: &Tensor<CpuRuntime>, position: usize, ) -> Result<()>

Source §

fn reshape_and_cache( &self, key: &Tensor<CpuRuntime>, value: &Tensor<CpuRuntime>, key_cache: &Tensor<CpuRuntime>, value_cache: &Tensor<CpuRuntime>, slot_mapping: &Tensor<CpuRuntime>, block_size: usize, ) -> Result<()>

Reshape and cache — writes new K/V tokens into paged KV cache blocks. Read more

Source §

impl KvCacheQuantOps<CpuRuntime> for CpuClient

Source §

fn quantize_kv_fp8_per_token( &self, input: &Tensor<CpuRuntime>, num_tokens: usize, head_dim: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Quantize KV cache to FP8 (E4M3) with per-token scaling Read more

Source §

fn dequantize_kv_fp8_per_token( &self, quantized: &Tensor<CpuRuntime>, scales: &Tensor<CpuRuntime>, num_tokens: usize, head_dim: usize, _output_dtype: DType, ) -> Result<Tensor<CpuRuntime>>

Dequantize FP8 KV cache back to original dtype

Source §

fn quantize_kv_int4( &self, input: &Tensor<CpuRuntime>, num_tokens: usize, head_dim: usize, group_size: Int4GroupSize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Quantize KV cache to INT4 with per-group asymmetric scaling Read more

Source §

fn dequantize_kv_int4( &self, packed: &Tensor<CpuRuntime>, scales: &Tensor<CpuRuntime>, zeros: &Tensor<CpuRuntime>, num_tokens: usize, head_dim: usize, group_size: Int4GroupSize, ) -> Result<Tensor<CpuRuntime>>

Dequantize INT4 KV cache back to F32

Source §

fn quantize_kv_int8( &self, input: &Tensor<CpuRuntime>, num_tokens: usize, head_dim: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Quantize KV cache to INT8 with per-token scaling Read more

Source §

fn dequantize_kv_int8( &self, quantized: &Tensor<CpuRuntime>, scales: &Tensor<CpuRuntime>, num_tokens: usize, head_dim: usize, ) -> Result<Tensor<CpuRuntime>>

Dequantize INT8 KV cache back to F32

Source §

impl LinalgOps<CpuRuntime> for CpuClient

LinalgOps implementation for CPU runtime.

Source §

fn solve( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Solve linear system Ax = b using LU decomposition Read more

Source §

fn lstsq( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Least squares solution: minimize ||Ax - b||² Read more

Source §

fn pinverse( &self, a: &Tensor<CpuRuntime>, rcond: Option<f64>, ) -> Result<Tensor<CpuRuntime>, Error>

Moore-Penrose pseudo-inverse via SVD: A^+ = V @ diag(1/S) @ U^T Read more

Source §

fn matrix_norm( &self, a: &Tensor<CpuRuntime>, ord: MatrixNormOrder, ) -> Result<Tensor<CpuRuntime>, Error>

Matrix norm Read more

Source §

fn inverse(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix inverse using LU decomposition Read more

Source §

fn det(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix determinant using LU decomposition Read more

Source §

fn trace(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix trace: sum of diagonal elements Read more

Source §

fn diag(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Extract diagonal elements Read more

Source §

fn diagflat(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Create diagonal matrix from 1D tensor Read more

Source §

fn matrix_rank( &self, a: &Tensor<CpuRuntime>, tol: Option<f64>, ) -> Result<Tensor<CpuRuntime>, Error>

Matrix rank via SVD Read more

Source §

fn kron( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Kronecker product: A ⊗ B Read more

Source §

fn solve_banded( &self, ab: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, kl: usize, ku: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Solve banded linear system using LAPACK-style band storage Read more

Source §

fn khatri_rao( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Khatri-Rao product (column-wise Kronecker product) Read more

Source §

fn triu( &self, a: &Tensor<CpuRuntime>, diagonal: i64, ) -> Result<Tensor<CpuRuntime>, Error>

Upper triangular part of a matrix Read more

Source §

fn tril( &self, a: &Tensor<CpuRuntime>, diagonal: i64, ) -> Result<Tensor<CpuRuntime>, Error>

Lower triangular part of a matrix Read more

Source §

fn slogdet( &self, a: &Tensor<CpuRuntime>, ) -> Result<SlogdetResult<CpuRuntime>, Error>

Sign and log-absolute-determinant Read more

Source §

impl LinearAlgebraAlgorithms<CpuRuntime> for CpuClient

Source §

fn lu_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<LuDecomposition<CpuRuntime>, Error>

LU Decomposition with partial pivoting: PA = LU

Source §

fn cholesky_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<CholeskyDecomposition<CpuRuntime>, Error>

Cholesky Decomposition: A = LL^T

Source §

fn qr_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<QrDecomposition<CpuRuntime>, Error>

QR Decomposition using Householder reflections: A = QR

Source §

fn qr_decompose_thin( &self, a: &Tensor<CpuRuntime>, ) -> Result<QrDecomposition<CpuRuntime>, Error>

Thin QR Decomposition: A = QR where Q is [m, k] and R is [k, n]

Source §

fn solve( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Solve linear system Ax = b using LU decomposition

Source §

fn solve_triangular_lower( &self, l: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, unit_diagonal: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Solve triangular system Lx = b (forward substitution)

Source §

fn solve_triangular_upper( &self, u: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Solve triangular system Ux = b (backward substitution)

Source §

fn lstsq( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Least squares solution: minimize ||Ax - b||²

Source §

fn solve_banded( &self, ab: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, kl: usize, ku: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Solve banded linear system Ab*x = b Read more

Source §

fn inverse(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix inverse using LU decomposition

Source §

fn det(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix determinant using LU decomposition

Source §

fn trace(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix trace: sum of diagonal elements

Source §

fn diag(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Extract diagonal elements

Source §

fn diagflat(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Create diagonal matrix from 1D tensor

Source §

fn kron( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Kronecker product: A ⊗ B Read more

Source §

fn triu( &self, a: &Tensor<CpuRuntime>, diagonal: i64, ) -> Result<Tensor<CpuRuntime>, Error>

Upper triangular part of a matrix Read more

Source §

fn tril( &self, a: &Tensor<CpuRuntime>, diagonal: i64, ) -> Result<Tensor<CpuRuntime>, Error>

Lower triangular part of a matrix Read more

Source §

fn slogdet( &self, a: &Tensor<CpuRuntime>, ) -> Result<SlogdetResult<CpuRuntime>, Error>

Sign and log-absolute-determinant Read more

Source §

fn khatri_rao( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Khatri-Rao product (column-wise Kronecker product): A ⊙ B Read more

Source §

fn matrix_rank( &self, a: &Tensor<CpuRuntime>, tol: Option<f64>, ) -> Result<Tensor<CpuRuntime>, Error>

Matrix rank via SVD

Source §

fn matrix_norm( &self, a: &Tensor<CpuRuntime>, ord: MatrixNormOrder, ) -> Result<Tensor<CpuRuntime>, Error>

Matrix norm (Frobenius, Spectral, or Nuclear)

Source §

fn svd_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<SvdDecomposition<CpuRuntime>, Error>

Singular Value Decomposition: A = U @ diag(S) @ V^T

Source §

fn eig_decompose_symmetric( &self, a: &Tensor<CpuRuntime>, ) -> Result<EigenDecomposition<CpuRuntime>, Error>

Eigendecomposition for symmetric matrices: A = V @ diag(λ) @ V^T

Source §

fn pinverse( &self, a: &Tensor<CpuRuntime>, rcond: Option<f64>, ) -> Result<Tensor<CpuRuntime>, Error>

Moore-Penrose pseudo-inverse via SVD

Source §

fn cond(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix condition number via SVD

Source §

fn cov( &self, a: &Tensor<CpuRuntime>, ddof: Option<usize>, ) -> Result<Tensor<CpuRuntime>, Error>

Covariance matrix

Source §

fn corrcoef(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Correlation coefficient matrix (Pearson correlation)

Source §

fn schur_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<SchurDecomposition<CpuRuntime>, Error>

Schur Decomposition: A = Z @ T @ Z^T

Source §

fn eig_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<GeneralEigenDecomposition<CpuRuntime>, Error>

General Eigendecomposition for non-symmetric matrices

Source §

fn rsf2csf( &self, schur: &SchurDecomposition<CpuRuntime>, ) -> Result<ComplexSchurDecomposition<CpuRuntime>, Error>

Convert Real Schur form to Complex Schur form: rsf2csf Read more

Source §

fn qz_decompose( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<GeneralizedSchurDecomposition<CpuRuntime>, Error>

Generalized Schur (QZ) decomposition for matrix pencil (A, B) Read more

Source §

fn polar_decompose( &self, a: &Tensor<CpuRuntime>, ) -> Result<PolarDecomposition<CpuRuntime>, Error>

Polar decomposition: A = U @ P Read more

Source §

impl LogicalOps<CpuRuntime> for CpuClient

Source §

fn logical_and( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise logical AND: a && b

Source §

fn logical_or( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise logical OR: a || b

Source §

fn logical_xor( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise logical XOR: a ^ b

Source §

fn logical_not( &self, a: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Element-wise logical NOT: !a

Source §

impl MatmulOps<CpuRuntime> for CpuClient

MatmulOps implementation for CPU runtime.

Source §

fn matmul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Matrix multiplication: a @ b Read more

Source §

fn matmul_bias( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Fused matrix multiplication with bias addition: C = A @ B + bias Read more

Source §

impl MatrixFunctionsAlgorithms<CpuRuntime> for CpuClient

Source §

fn expm(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix exponential: e^A Read more

Source §

fn logm(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix logarithm: log(A) (principal branch) Read more

Source §

fn sqrtm(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix square root: A^{1/2} (principal branch) Read more

Source §

fn signm(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Matrix sign function: sign(A) Read more

Source §

fn fractional_matrix_power( &self, a: &Tensor<CpuRuntime>, p: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Fractional matrix power: A^p for any real p Read more

Source §

fn funm<F>( &self, a: &Tensor<CpuRuntime>, f: F, ) -> Result<Tensor<CpuRuntime>, Error>
where F: Fn(f64) -> f64 + Send + Sync,

General matrix function: f(A) for any scalar function f Read more

Source §

impl MlaOps<CpuRuntime> for CpuClient

Source §

fn scaled_dot_product_attention( &self, q: &Var<CpuRuntime>, k: &Var<CpuRuntime>, v: &Var<CpuRuntime>, scale: f64, causal: bool, ) -> Result<Var<CpuRuntime>>

Source §

impl MoEOps<CpuRuntime> for CpuClient

Source §

fn moe_top_k_routing( &self, logits: &Tensor<CpuRuntime>, k: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Top-k expert routing with softmax normalization. Read more

Source §

fn moe_permute_tokens( &self, tokens: &Tensor<CpuRuntime>, indices: &Tensor<CpuRuntime>, num_experts: usize, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Permute tokens into expert-grouped order. Read more

Source §

fn moe_unpermute_tokens( &self, expert_output: &Tensor<CpuRuntime>, sort_indices: &Tensor<CpuRuntime>, weights: &Tensor<CpuRuntime>, num_tokens: usize, ) -> Result<Tensor<CpuRuntime>>

Unpermute expert outputs back to original token order. Read more

Source §

fn moe_grouped_gemm( &self, permuted_tokens: &Tensor<CpuRuntime>, expert_weights: &Tensor<CpuRuntime>, expert_offsets: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Grouped GEMM across experts. Read more

Source §

fn moe_grouped_gemm_fused( &self, permuted_tokens: &Tensor<CpuRuntime>, expert_weights: &Tensor<CpuRuntime>, expert_offsets: &Tensor<CpuRuntime>, activation: MoEActivation, ) -> Result<Tensor<CpuRuntime>>

Fused grouped GEMM with activation. Read more

Source §

impl MultivariateRandomOps<CpuRuntime> for CpuClient

Source §

fn multivariate_normal( &self, mean: &Tensor<CpuRuntime>, cov: &Tensor<CpuRuntime>, n_samples: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a multivariate normal distribution: X ~ N(μ, Σ) Read more

Source §

fn wishart( &self, scale: &Tensor<CpuRuntime>, df: usize, n_samples: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Wishart distribution: W ~ W(V, df) Read more

Source §

fn dirichlet( &self, alpha: &Tensor<CpuRuntime>, n_samples: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Dirichlet distribution: X ~ Dir(α) Read more

Source §

fn multinomial_samples( &self, probs: &Tensor<CpuRuntime>, n_trials: usize, n_samples: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a multinomial distribution with counts: X ~ Multinomial(probs, n_trials) Read more

Source §

impl NormalizationOps<CpuRuntime> for CpuClient

NormalizationOps implementation for CPU runtime.

Source §

fn rms_norm( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, eps: f32, ) -> Result<Tensor<CpuRuntime>, Error>

RMS Normalization: output = input * rsqrt(mean(input^2) + eps) * weight Read more

Source §

fn layer_norm( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, eps: f32, ) -> Result<Tensor<CpuRuntime>, Error>

Layer Normalization: output = (input - mean) / sqrt(variance + eps) * weight + bias Read more

Source §

fn group_norm( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, num_groups: usize, eps: f32, ) -> Result<Tensor<CpuRuntime>, Error>

Group Normalization: normalize over groups of channels. Read more

Source §

fn fused_add_rms_norm( &self, x: &Tensor<CpuRuntime>, residual: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, eps: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Fused Add + RMS Normalization: pre_norm = x + residual, output = rms_norm(pre_norm, weight, eps) Read more

Source §

fn fused_add_rms_norm_bwd( &self, grad: &Tensor<CpuRuntime>, pre_norm: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, eps: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Backward pass for fused add + RMS normalization. Read more

Source §

fn fused_add_layer_norm( &self, x: &Tensor<CpuRuntime>, residual: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, eps: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Fused Add + Layer Normalization: pre_norm = x + residual, output = layer_norm(pre_norm, weight, bias, eps) Read more

Source §

fn fused_add_layer_norm_bwd( &self, grad: &Tensor<CpuRuntime>, pre_norm: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, eps: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Backward pass for fused add + layer normalization. Read more

Source §

impl PagedAttentionOps<CpuRuntime> for CpuClient

Source §

fn paged_attention_fwd( &self, q: &Tensor<CpuRuntime>, k_blocks: &Tensor<CpuRuntime>, v_blocks: &Tensor<CpuRuntime>, block_table: &Tensor<CpuRuntime>, num_heads: usize, _num_kv_heads: usize, _seq_len_q: usize, seq_len_k: usize, head_dim: usize, block_size: usize, causal: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Paged attention forward pass (F32, F16, BF16)

Source §

fn paged_attention_fwd_fp8( &self, _q: &Tensor<CpuRuntime>, _k_blocks: &Tensor<CpuRuntime>, _v_blocks: &Tensor<CpuRuntime>, _block_table: &Tensor<CpuRuntime>, _num_heads: usize, _num_kv_heads: usize, _seq_len_q: usize, _seq_len_k: usize, _head_dim: usize, _block_size: usize, _causal: bool, _q_scale: f32, _k_scale: f32, _v_scale: f32, _o_scale: f32, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Paged attention forward pass for FP8 tensors

Source §

fn paged_attention_bwd( &self, dout: &Tensor<CpuRuntime>, q: &Tensor<CpuRuntime>, k_blocks: &Tensor<CpuRuntime>, v_blocks: &Tensor<CpuRuntime>, output: &Tensor<CpuRuntime>, lse: &Tensor<CpuRuntime>, block_table: &Tensor<CpuRuntime>, num_heads: usize, _num_kv_heads: usize, _seq_len_q: usize, seq_len_k: usize, head_dim: usize, block_size: usize, causal: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Paged attention backward pass Read more

Source §

impl PolynomialAlgorithms<CpuRuntime> for CpuClient

Source §

fn polyroots( &self, coeffs: &Tensor<CpuRuntime>, ) -> Result<PolynomialRoots<CpuRuntime>, Error>

Find roots of a polynomial via companion matrix eigendecomposition Read more

Source §

fn polyval( &self, coeffs: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Evaluate polynomial at given points using Horner’s method Read more

Source §

fn polyfromroots( &self, roots_real: &Tensor<CpuRuntime>, roots_imag: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Construct polynomial coefficients from roots Read more

Source §

fn polymul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Multiply two polynomials via convolution Read more

Source §

impl QuantMatmulOps<CpuRuntime> for CpuClient

Source §

fn int4_gemm( &self, input: &Tensor<CpuRuntime>, qweight: &Tensor<CpuRuntime>, scales: &Tensor<CpuRuntime>, zeros: &Tensor<CpuRuntime>, group_size: usize, ) -> Result<Tensor<CpuRuntime>>

AWQ W4A16 GEMM: input × dequantized INT4 weight Read more

Source §

fn int4_gemm_gptq( &self, input: &Tensor<CpuRuntime>, qweight: &Tensor<CpuRuntime>, qzeros: &Tensor<CpuRuntime>, scales: &Tensor<CpuRuntime>, g_idx: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

GPTQ W4A16 GEMM: input × dequantized INT4 weight (GPTQ layout) Read more

Source §

fn marlin_gemm( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, scales: &Tensor<CpuRuntime>, zeros: &Tensor<CpuRuntime>, group_size: usize, ) -> Result<Tensor<CpuRuntime>>

Marlin-format W4A16 GEMM: tensor-core-friendly sequential INT4 packing Read more

Source §

fn quant_matmul_batch( &self, activation: &Tensor<CpuRuntime>, weights: &[&QuantTensor<CpuRuntime>], ) -> Result<Vec<Tensor<CpuRuntime>>>

Batched quant_matmul: same activation × multiple quantized weights. Read more

Source §

fn quant_matmul( &self, activation: &Tensor<CpuRuntime>, weight: &QuantTensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Source §

fn quant_swiglu( &self, activation: &Tensor<CpuRuntime>, gate_weight: &QuantTensor<CpuRuntime>, up_weight: &QuantTensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Fused SwiGLU: silu(activation × gate_weight) * (activation × up_weight) Read more

Source §

impl QuasiRandomOps<CpuRuntime> for CpuClient

Source §

fn sobol( &self, n_points: usize, dimension: usize, skip: usize, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate Sobol sequence points. Read more

Source §

fn halton( &self, n_points: usize, dimension: usize, skip: usize, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate Halton sequence points. Read more

Source §

fn latin_hypercube( &self, n_samples: usize, dimension: usize, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate Latin Hypercube Sampling points. Read more

Source §

impl RandomOps<CpuRuntime> for CpuClient

RandomOps implementation for CPU runtime.

Source §

fn rand( &self, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate uniform random values in [0, 1) Read more

Source §

fn rand_seeded( &self, shape: &[usize], dtype: DType, seed: u64, ) -> Result<Tensor<CpuRuntime>, Error>

Generate uniform random values in [0, 1) with a deterministic seed Read more

Source §

fn randn( &self, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate standard normal random values (mean=0, std=1) Read more

Source §

fn randint( &self, low: i64, high: i64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Generate random integers in the range [low, high) Read more

Source §

fn multinomial( &self, probs: &Tensor<CpuRuntime>, num_samples: usize, replacement: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a multinomial (categorical) distribution Read more

Source §

fn bernoulli( &self, p: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Bernoulli distribution Read more

Source §

fn beta( &self, alpha: f64, beta: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Beta distribution Read more

Source §

fn gamma( &self, shape_param: f64, scale: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Gamma distribution Read more

Source §

fn exponential( &self, rate: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from an Exponential distribution Read more

Source §

fn poisson( &self, lambda: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Poisson distribution Read more

Source §

fn binomial( &self, n: u64, p: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Binomial distribution Read more

Source §

fn laplace( &self, loc: f64, scale: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Laplace (double exponential) distribution Read more

Source §

fn chi_squared( &self, df: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Chi-squared distribution Read more

Source §

fn student_t( &self, df: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from a Student’s t distribution Read more

Source §

fn randperm(&self, n: usize) -> Result<Tensor<CpuRuntime>, Error>

Generate a random permutation of integers [0, n) Read more

Source §

fn f_distribution( &self, df1: f64, df2: f64, shape: &[usize], dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Sample from an F distribution Read more

Source §

impl ReduceOps<CpuRuntime> for CpuClient

ReduceOps implementation for CPU runtime.

Source §

fn sum( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Sum along specified dimensions

Source §

fn sum_with_precision( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, precision: AccumulationPrecision, ) -> Result<Tensor<CpuRuntime>, Error>

Sum along specified dimensions with explicit accumulation precision. Read more

Source §

fn mean( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Mean along specified dimensions

Source §

fn max( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Maximum along specified dimensions

Source §

fn max_with_precision( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, precision: AccumulationPrecision, ) -> Result<Tensor<CpuRuntime>, Error>

Maximum along specified dimensions with explicit accumulation precision. Read more

Source §

fn min( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Minimum along specified dimensions

Source §

fn min_with_precision( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, precision: AccumulationPrecision, ) -> Result<Tensor<CpuRuntime>, Error>

Minimum along specified dimensions with explicit accumulation precision. Read more

Source §

fn prod( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Product along specified dimensions Read more

Source §

fn prod_with_precision( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, precision: AccumulationPrecision, ) -> Result<Tensor<CpuRuntime>, Error>

Product along specified dimensions with explicit accumulation precision. Read more

Source §

fn any( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Test if any element is true (non-zero) along specified dimensions. Read more

Source §

fn all( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Test if all elements are true (non-zero) along specified dimensions. Read more

Source §

impl RoPEOps<CpuRuntime> for CpuClient

Source §

fn apply_rope( &self, x: &Var<CpuRuntime>, cos_cache: &Var<CpuRuntime>, sin_cache: &Var<CpuRuntime>, ) -> Result<Var<CpuRuntime>>

Standard (split-half) RoPE: pairs are (x[..., d], x[..., d+D/2]). Used by LLaMA, Mistral, and most modern LLMs.

Source §

fn apply_rope_interleaved( &self, x: &Var<CpuRuntime>, cos_cache: &Var<CpuRuntime>, sin_cache: &Var<CpuRuntime>, ) -> Result<Var<CpuRuntime>>

Interleaved RoPE: pairs are (x[..., 2d], x[..., 2d+1]). Used by GPT-NeoX, Qwen, RoFormer — the “mathematically pure” form treating adjacent elements as complex number (real + imaginary). Read more

Source §

fn apply_rope_yarn( &self, x: &Var<CpuRuntime>, cos_cache: &Var<CpuRuntime>, sin_cache: &Var<CpuRuntime>, attn_scale: f32, ) -> Result<Var<CpuRuntime>>

YaRN (Yet another RoPE extensioN) for extended context lengths. Reference: https://arxiv.org/abs/2309.00071 Read more

Source §

impl RoPEPackedOps<CpuRuntime> for CpuClient

Source §

fn apply_rope_packed( &self, x: &Var<CpuRuntime>, cos_cache: &Var<CpuRuntime>, sin_cache: &Var<CpuRuntime>, position_ids: &Tensor<CpuRuntime>, ) -> Result<Var<CpuRuntime>>

Apply position-id-aware (packed) split-half RoPE. Read more

Source §

impl Runtime for CpuRuntime

Source §

type Device = CpuDevice

Device identifier type

Source §

type Client = CpuClient

Client for dispatching operations

Source §

type Allocator = DefaultAllocator<CpuDevice>

Memory allocator type

Source §

type Graph = NoOpGraph

Captured computation graph for replay Read more

Source §

type RawHandle = ()

Raw handle for custom kernel launching (escape hatch) Read more

Source §

type DType = DType

Data type enum for tensor elements. Read more

Source §

fn name() -> &'static str

Human-readable name of this runtime

Source §

fn allocate( size_bytes: usize, _device: &<CpuRuntime as Runtime>::Device, ) -> Result<u64, Error>

Allocate device memory Read more

Source §

fn deallocate( ptr: u64, size_bytes: usize, _device: &<CpuRuntime as Runtime>::Device, )

Deallocate device memory

Source §

fn copy_to_device( src: &[u8], dst: u64, _device: &<CpuRuntime as Runtime>::Device, ) -> Result<(), Error>

Copy data from host to device Read more

Source §

fn copy_from_device( src: u64, dst: &mut [u8], _device: &<CpuRuntime as Runtime>::Device, ) -> Result<(), Error>

Copy data from device to host Read more

Source §

fn copy_within_device( src: u64, dst: u64, size_bytes: usize, _device: &<CpuRuntime as Runtime>::Device, ) -> Result<(), Error>

Copy data within device (device to device) Read more

Source §

fn copy_strided( src_handle: u64, src_byte_offset: usize, dst_handle: u64, shape: &[usize], strides: &[isize], elem_size: usize, _device: &<CpuRuntime as Runtime>::Device, ) -> Result<(), Error>

Copy strided data to a contiguous buffer Read more

Source §

fn default_device() -> <CpuRuntime as Runtime>::Device

Get the default device

Source §

fn default_client( device: &<CpuRuntime as Runtime>::Device, ) -> <CpuRuntime as Runtime>::Client

Get the default client for a device

Source §

fn raw_handle( _client: &<CpuRuntime as Runtime>::Client, ) -> &<CpuRuntime as Runtime>::RawHandle

Get the raw handle from a client (escape hatch for custom kernels)

Source §

fn capture_graph_into<F>( client: &<CpuRuntime as Runtime>::Client, inputs: &[&Tensor<CpuRuntime>], outputs: &[&Tensor<CpuRuntime>], f: F, ) -> Result<CapturedGraph<CpuRuntime>, Error>
where F: FnOnce(&<CpuRuntime as Runtime>::Client) -> Result<(), Error>,

Destination-passing graph capture. Read more

Source §

fn supports_graph_capture() -> bool

Does this backend support graph capture (e.g., CUDA Graphs)? Read more

Source §

fn record_compute_event(_device: &Self::Device) -> Result<u64, Error>

Record an event on the compute stream. Returns an opaque handle. On non-CUDA backends, returns 0 (no-op).

Source §

fn copy_from_device_pipelined( src: u64, dst: &mut [u8], device: &Self::Device, event: u64, ) -> Result<(), Error>

Copy data from device to host using a dedicated copy stream, synchronized via a previously recorded event. Read more

Source §

impl RuntimeClient<CpuRuntime> for CpuClient

Source §

fn device(&self) -> &CpuDevice

Get the device this client operates on

Source §

fn synchronize(&self)

Synchronize: wait for all pending operations to complete

Source §

fn allocator(&self) -> &DefaultAllocator<CpuDevice>

Get the allocator for this client

Source §

fn compute_stream_handle(&self) -> Option<u64>

Get the raw CUDA stream handle for compute-communication overlap. Read more

Source §

impl SamplingOps<CpuRuntime> for CpuClient

Source §

fn apply_sampling_penalties( &self, logits: &Tensor<CpuRuntime>, token_ids: &Tensor<CpuRuntime>, token_counts: &Tensor<CpuRuntime>, repeat_penalty: f32, frequency_penalty: f32, presence_penalty: f32, ) -> Result<()>

Apply repetition, frequency, and presence penalties to logits in-place. Read more

Source §

fn sample_token( &self, logits: &Tensor<CpuRuntime>, temperature: f32, top_k: usize, top_p: f32, min_p: f32, ) -> Result<u32>

Sample a single token from logits using the full stochastic pipeline. Read more

Source §

fn logits_to_token( &self, logits: &Tensor<CpuRuntime>, token_ids: &Tensor<CpuRuntime>, token_counts: &Tensor<CpuRuntime>, num_unique: usize, repeat_penalty: f32, frequency_penalty: f32, presence_penalty: f32, temperature: f32, top_k: usize, top_p: f32, min_p: f32, seed: Option<u64>, ) -> Result<Tensor<CpuRuntime>>

Fused logits-to-token: narrow last position → cast F32 → apply penalties → argmax/sample. Read more

Source §

impl ScalarOps<CpuRuntime> for CpuClient

Source §

fn add_scalar( &self, a: &Tensor<CpuRuntime>, scalar: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Add scalar to tensor: a + scalar

Source §

fn sub_scalar( &self, a: &Tensor<CpuRuntime>, scalar: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Subtract scalar from tensor: a - scalar

Source §

fn mul_scalar( &self, a: &Tensor<CpuRuntime>, scalar: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Multiply tensor by scalar: a * scalar

Source §

fn div_scalar( &self, a: &Tensor<CpuRuntime>, scalar: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Divide tensor by scalar: a / scalar

Source §

fn pow_scalar( &self, a: &Tensor<CpuRuntime>, scalar: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Raise tensor to scalar power: a^scalar

Source §

fn rsub_scalar( &self, a: &Tensor<CpuRuntime>, scalar: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Reverse subtract: scalar - a

Source §

fn fused_mul_add_scalar( &self, a: &Tensor<CpuRuntime>, scale: f64, bias: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Fused multiply-add scalar: a * scale + bias Read more

Source §

impl SemiringMatmulOps<CpuRuntime> for CpuClient

Source §

fn semiring_matmul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, op: SemiringOp, ) -> Result<Tensor<CpuRuntime>, Error>

Generalized semiring matrix multiplication. Read more

Source §

impl ShapeOps<CpuRuntime> for CpuClient

ShapeOps implementation for CPU runtime.

Source §

fn cat( &self, tensors: &[&Tensor<CpuRuntime>], dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Concatenate tensors along a dimension Read more

Source §

fn stack( &self, tensors: &[&Tensor<CpuRuntime>], dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Stack tensors along a new dimension Read more

Source §

fn split( &self, tensor: &Tensor<CpuRuntime>, split_size: usize, dim: isize, ) -> Result<Vec<Tensor<CpuRuntime>>, Error>

Split a tensor into chunks of a given size along a dimension Read more

Source §

fn chunk( &self, tensor: &Tensor<CpuRuntime>, chunks: usize, dim: isize, ) -> Result<Vec<Tensor<CpuRuntime>>, Error>

Split a tensor into a specific number of chunks along a dimension Read more

Source §

fn repeat( &self, tensor: &Tensor<CpuRuntime>, repeats: &[usize], ) -> Result<Tensor<CpuRuntime>, Error>

Repeat tensor along each dimension Read more

Source §

fn pad( &self, tensor: &Tensor<CpuRuntime>, padding: &[usize], value: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Pad tensor with a constant value Read more

Source §

fn roll( &self, tensor: &Tensor<CpuRuntime>, shift: isize, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

Roll tensor elements along a dimension Read more

Source §

fn unfold( &self, tensor: &Tensor<CpuRuntime>, dim: isize, size: usize, step: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Extract sliding local windows along a dimension. Read more

Source §

fn repeat_interleave( &self, tensor: &Tensor<CpuRuntime>, repeats: usize, dim: Option<isize>, ) -> Result<Tensor<CpuRuntime>, Error>

Repeat each element along a dimension. Read more

Source §

impl SortingOps<CpuRuntime> for CpuClient

SortingOps implementation for CPU runtime.

Source §

fn sort( &self, a: &Tensor<CpuRuntime>, dim: isize, descending: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Sort tensor along a dimension. Read more

Source §

fn sort_with_indices( &self, a: &Tensor<CpuRuntime>, dim: isize, descending: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Sort tensor along a dimension, returning both sorted values and indices. Read more

Source §

fn argsort( &self, a: &Tensor<CpuRuntime>, dim: isize, descending: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Return indices that would sort the tensor along a dimension. Read more

Source §

fn topk( &self, a: &Tensor<CpuRuntime>, k: usize, dim: isize, largest: bool, sorted: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Return top K largest (or smallest) values and their indices along a dimension. Read more

Source §

fn unique( &self, a: &Tensor<CpuRuntime>, sorted: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Return unique elements of the input tensor. Read more

Source §

fn unique_with_counts( &self, a: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Return unique elements with inverse indices and counts. Read more

Source §

fn nonzero(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Return indices of non-zero elements. Read more

Source §

fn searchsorted( &self, sorted_sequence: &Tensor<CpuRuntime>, values: &Tensor<CpuRuntime>, right: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Find insertion points for values in a sorted sequence. Read more

Source §

impl SpecialFunctions<CpuRuntime> for CpuClient

Source §

fn erf(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the error function element-wise. Read more

Source §

fn erfc(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the complementary error function element-wise. Read more

Source §

fn erfinv(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the inverse error function element-wise. Read more

Source §

fn gamma(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the gamma function element-wise. Read more

Source §

fn lgamma(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the log-gamma function element-wise. Read more

Source §

fn digamma(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the digamma (psi) function element-wise. Read more

Source §

fn beta( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the beta function element-wise. Read more

Source §

fn betainc( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the regularized incomplete beta function element-wise. Read more

Source §

fn gammainc( &self, a: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the lower regularized incomplete gamma function. Read more

Source §

fn gammaincc( &self, a: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the upper regularized incomplete gamma function. Read more

Source §

fn gammaincinv( &self, a: &Tensor<CpuRuntime>, p: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the inverse of the lower regularized incomplete gamma function. Read more

Source §

fn betaincinv( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, p: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the inverse of the regularized incomplete beta function. Read more

Source §

fn bessel_j0(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute Bessel function of the first kind, order 0. Read more

Source §

fn bessel_j1(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute Bessel function of the first kind, order 1. Read more

Source §

fn bessel_y0(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute Bessel function of the second kind, order 0 (Neumann function). Read more

Source §

fn bessel_y1(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute Bessel function of the second kind, order 1 (Neumann function). Read more

Source §

fn bessel_i0(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute modified Bessel function of the first kind, order 0. Read more

Source §

fn bessel_i1(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute modified Bessel function of the first kind, order 1. Read more

Source §

fn bessel_k0(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute modified Bessel function of the second kind, order 0. Read more

Source §

fn bessel_k1(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute modified Bessel function of the second kind, order 1. Read more

Source §

fn ellipk(&self, m: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the complete elliptic integral of the first kind K(m). Read more

Source §

fn ellipe(&self, m: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the complete elliptic integral of the second kind E(m). Read more

Source §

fn hyp2f1( &self, a: f64, b: f64, c: f64, z: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the Gauss hypergeometric function ₂F₁(a, b; c; z). Read more

Source §

fn hyp1f1( &self, a: f64, b: f64, z: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the confluent hypergeometric function ₁F₁(a; b; z) (Kummer’s M). Read more

Source §

fn airy_ai(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the Airy function of the first kind Ai(x). Read more

Source §

fn airy_bi(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the Airy function of the second kind Bi(x). Read more

Source §

fn legendre_p( &self, n: i32, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the Legendre polynomial P_n(x). Read more

Source §

fn legendre_p_assoc( &self, n: i32, m: i32, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the associated Legendre function P_n^m(x). Read more

Source §

fn sph_harm( &self, n: i32, m: i32, theta: &Tensor<CpuRuntime>, phi: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the real spherical harmonic Y_n^m(θ, φ). Read more

Source §

fn fresnel_s(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the Fresnel sine integral S(x). Read more

Source §

fn fresnel_c(&self, x: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Compute the Fresnel cosine integral C(x). Read more

Source §

impl SpeculativeOps<CpuRuntime> for CpuClient

Source §

fn verify_speculative_tokens( &self, draft_probs: &Tensor<CpuRuntime>, target_probs: &Tensor<CpuRuntime>, draft_tokens: &Tensor<CpuRuntime>, seed: u64, ) -> Result<Vec<VerificationResult>>

Verify draft tokens against target model probabilities. Read more

Source §

fn compute_acceptance_probs( &self, draft_probs: &Tensor<CpuRuntime>, target_probs: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Compute acceptance probabilities for analysis/diagnostics. Read more

Source §

fn compute_expected_tokens( &self, acceptance_rates: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Compute expected tokens per verification step for adaptive depth. Read more

Source §

impl SsmKernelOps<CpuRuntime> for CpuClient

Source §

fn ssd_chunk_cumsum( &self, dt: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, dt_bias: Option<&Tensor<CpuRuntime>>, chunk_size: usize, dt_softplus: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Compute discretised decay cumulative sum. Read more

Source §

fn ssd_chunk_state( &self, x: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, dt: &Tensor<CpuRuntime>, dA_cumsum: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Compute per-chunk final hidden states. Read more

Source §

fn ssd_state_passing( &self, states: &Tensor<CpuRuntime>, dA_cumsum: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

Propagate hidden states across chunks (sequential scan). Read more

Source §

fn ssd_chunk_scan( &self, x: &Tensor<CpuRuntime>, states: &Tensor<CpuRuntime>, c: &Tensor<CpuRuntime>, dA_cumsum: &Tensor<CpuRuntime>, d: Option<&Tensor<CpuRuntime>>, ) -> Result<Tensor<CpuRuntime>>

Compute chunk output via state projection. Read more

Source §

impl StatisticalOps<CpuRuntime> for CpuClient

StatisticalOps implementation for CPU runtime.

Source §

fn var( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, correction: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Variance along specified dimensions Read more

Source §

fn std( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, correction: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Standard deviation along specified dimensions Read more

Source §

fn quantile( &self, a: &Tensor<CpuRuntime>, q: f64, dim: Option<isize>, keepdim: bool, interpolation: &str, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the q-th quantile along a dimension Read more

Source §

fn percentile( &self, a: &Tensor<CpuRuntime>, p: f64, dim: Option<isize>, keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Compute the p-th percentile along a dimension Read more

Source §

fn median( &self, a: &Tensor<CpuRuntime>, dim: Option<isize>, keepdim: bool, ) -> Result<Tensor<CpuRuntime>, Error>

Compute median (50th percentile) along a dimension Read more

Source §

fn histogram( &self, a: &Tensor<CpuRuntime>, bins: usize, range: Option<(f64, f64)>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Compute histogram of input values Read more

Source §

fn cov( &self, a: &Tensor<CpuRuntime>, ddof: Option<usize>, ) -> Result<Tensor<CpuRuntime>, Error>

Covariance matrix of observations Read more

Source §

fn corrcoef(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Pearson correlation coefficient matrix Read more

Source §

fn skew( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, correction: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Skewness (third standardized moment) Read more

Source §

fn kurtosis( &self, a: &Tensor<CpuRuntime>, dims: &[usize], keepdim: bool, correction: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Kurtosis (fourth standardized moment, excess) Read more

Source §

fn mode( &self, a: &Tensor<CpuRuntime>, dim: Option<isize>, keepdim: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

Compute the mode (most frequent value) along a dimension Read more

Source §

impl TensorDecomposeAlgorithms<CpuRuntime> for CpuClient

Source §

fn unfold( &self, tensor: &Tensor<CpuRuntime>, mode: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Mode-n unfolding (matricization) of a tensor Read more

Source §

fn fold( &self, matrix: &Tensor<CpuRuntime>, mode: usize, shape: &[usize], ) -> Result<Tensor<CpuRuntime>, Error>

Mode-n folding (tensorization) - inverse of unfolding Read more

Source §

fn mode_n_product( &self, tensor: &Tensor<CpuRuntime>, matrix: &Tensor<CpuRuntime>, mode: usize, ) -> Result<Tensor<CpuRuntime>, Error>

Mode-n product: tensor × matrix along mode n Read more

Source §

fn hosvd( &self, tensor: &Tensor<CpuRuntime>, ranks: &[usize], ) -> Result<TuckerDecomposition<CpuRuntime>, Error>

Higher-Order SVD (HOSVD) decomposition Read more

Source §

fn tucker( &self, tensor: &Tensor<CpuRuntime>, ranks: &[usize], options: TuckerOptions, ) -> Result<TuckerDecomposition<CpuRuntime>, Error>

Tucker decomposition via Higher-Order Orthogonal Iteration (HOOI) Read more

Source §

fn cp_decompose( &self, tensor: &Tensor<CpuRuntime>, rank: usize, options: CpOptions, ) -> Result<CpDecomposition<CpuRuntime>, Error>

CP/PARAFAC decomposition via Alternating Least Squares (ALS) Read more

Source §

fn tensor_train( &self, tensor: &Tensor<CpuRuntime>, max_rank: usize, tolerance: f64, ) -> Result<TensorTrainDecomposition<CpuRuntime>, Error>

Tensor-Train (TT) decomposition via TT-SVD Read more

Source §

fn tucker_reconstruct( &self, decomp: &TuckerDecomposition<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Reconstruct tensor from Tucker decomposition Read more

Source §

fn cp_reconstruct( &self, decomp: &CpDecomposition<CpuRuntime>, shape: &[usize], ) -> Result<Tensor<CpuRuntime>, Error>

Reconstruct tensor from CP decomposition Read more

Source §

fn tt_reconstruct( &self, decomp: &TensorTrainDecomposition<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

Reconstruct tensor from Tensor-Train decomposition Read more

Source §

impl TensorOps<CpuRuntime> for CpuClient

Source §

impl TypeConversionOps<CpuRuntime> for CpuClient

TypeConversionOps implementation for CPU runtime.

Source §

fn cast( &self, a: &Tensor<CpuRuntime>, target_dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Cast tensor to a different data type. Read more

Source §

impl UnaryOps<CpuRuntime> for CpuClient

UnaryOps implementation for CPU runtime.

Source §

fn neg(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Negation: -a

Source §

fn abs(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Absolute value: |a|

Source §

fn sqrt(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Square root: sqrt(a)

Source §

fn exp(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Exponential: e^a

Source §

fn log(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Natural logarithm: ln(a)

Source §

fn sin(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Sine: sin(a)

Source §

fn cos(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Cosine: cos(a)

Source §

fn tanh(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Hyperbolic tangent: tanh(a)

Source §

fn tan(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Tangent: tan(a)

Source §

fn asin(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Arc sine (inverse sine): asin(a), domain [-1,1], range [-π/2, π/2]

Source §

fn acos(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Arc cosine (inverse cosine): acos(a), domain [-1,1], range [0, π]

Source §

fn atan(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Arc tangent (inverse tangent): atan(a)

Source §

fn sinh(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Hyperbolic sine: sinh(a)

Source §

fn cosh(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Hyperbolic cosine: cosh(a)

Source §

fn asinh(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Inverse hyperbolic sine: asinh(a)

Source §

fn acosh(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Inverse hyperbolic cosine: acosh(a), domain [1, ∞)

Source §

fn atanh(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Inverse hyperbolic tangent: atanh(a), domain (-1, 1)

Source §

fn recip(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Reciprocal: 1/a

Source §

fn rsqrt(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Reciprocal square root: 1/sqrt(a) - critical for normalization layers

Source §

fn square(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Square: a²

Source §

fn cbrt(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Cube root: cbrt(a)

Source §

fn exp2(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Base-2 exponential: 2^a

Source §

fn expm1(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Exponential minus 1: e^a - 1 (numerically stable for small a)

Source §

fn log2(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Base-2 logarithm: log2(a)

Source §

fn log10(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Base-10 logarithm: log10(a)

Source §

fn log1p(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Natural log of 1+a: ln(1+a) (numerically stable for small a)

Source §

fn floor(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Floor: floor(a)

Source §

fn ceil(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Ceiling: ceil(a)

Source §

fn round(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Round: round(a) to nearest integer

Source §

fn trunc(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Truncate toward zero: trunc(a)

Source §

fn sign(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Sign: returns -1 for negative, 0 for zero, 1 for positive

Source §

fn isnan(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Check for NaN values: returns U8 tensor (1 if NaN, 0 otherwise)

Source §

fn isinf(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

Check for Inf values: returns U8 tensor (1 if Inf, 0 otherwise)

Source §

impl UtilityOps<CpuRuntime> for CpuClient

UtilityOps implementation for CPU runtime.

Source §

fn clamp( &self, a: &Tensor<CpuRuntime>, min_val: f64, max_val: f64, ) -> Result<Tensor<CpuRuntime>, Error>

Clamp tensor values to a range: clamp(x, min, max) = min(max(x, min), max) Read more

Source §

fn fill( &self, shape: &[usize], value: f64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Fill tensor with a constant value Read more

Source §

fn arange( &self, start: f64, stop: f64, step: f64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Create a 1D tensor with evenly spaced values within a half-open interval [start, stop) Read more

Source §

fn linspace( &self, start: f64, stop: f64, steps: usize, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Create a 1D tensor with evenly spaced values over a specified interval Read more

Source §

fn eye( &self, n: usize, m: Option<usize>, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

Create a 2D identity matrix (or batch of identity matrices) Read more

Source §

fn one_hot( &self, indices: &Tensor<CpuRuntime>, num_classes: usize, ) -> Result<Tensor<CpuRuntime>, Error>

One-hot encode integer indices Read more

Source §

fn meshgrid( &self, tensors: &[&Tensor<CpuRuntime>], indexing: MeshgridIndexing, ) -> Result<Vec<Tensor<CpuRuntime>>, Error>

Create coordinate grids from 1-D coordinate vectors Read more

Source §

impl VarLenAttentionOps<CpuRuntime> for CpuClient

Source §

fn varlen_attention_fwd( &self, q: &Tensor<CpuRuntime>, k: &Tensor<CpuRuntime>, v: &Tensor<CpuRuntime>, cu_seqlens_q: &Tensor<CpuRuntime>, cu_seqlens_k: &Tensor<CpuRuntime>, batch_size: usize, num_heads: usize, num_kv_heads: usize, _max_seqlen_q: usize, _max_seqlen_k: usize, head_dim: usize, causal: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Variable-length attention forward pass Read more

Source §

fn varlen_attention_bwd( &self, dout: &Tensor<CpuRuntime>, q: &Tensor<CpuRuntime>, k: &Tensor<CpuRuntime>, v: &Tensor<CpuRuntime>, output: &Tensor<CpuRuntime>, lse: &Tensor<CpuRuntime>, cu_seqlens_q: &Tensor<CpuRuntime>, cu_seqlens_k: &Tensor<CpuRuntime>, batch_size: usize, num_heads: usize, num_kv_heads: usize, _max_seqlen_q: usize, _max_seqlen_k: usize, head_dim: usize, causal: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

Variable-length attention backward pass Read more

Auto Trait Implementations§

§

impl UnwindSafe for CpuRuntime

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> ArchivePointee for T

Source §

type ArchivedMetadata = ()

The archived version of the pointer metadata for this type.

Source §

fn pointer_metadata( _: &<T as ArchivePointee>::ArchivedMetadata, ) -> <T as Pointee>::Metadata

Converts some archived metadata to the pointer metadata for itself.

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

impl<T> LayoutRaw for T

Source §

fn layout_raw(_: <T as Pointee>::Metadata) -> Result<Layout, LayoutError>

Returns the layout of the type.

Source §

impl<T, N1, N2> Niching<NichedOption<T, N1>> for N2
where T: SharedNiching<N1, N2>, N1: Niching<T>, N2: Niching<T>,

Source §

unsafe fn is_niched(niched: *const NichedOption<T, N1>) -> bool

Returns whether the given value has been niched. Read more

Source §

fn resolve_niched(out: Place<NichedOption<T, N1>>)

Writes data to out indicating that a T is niched.

Source §

impl<T> Pointable for T

Source §

const ALIGN: usize

The alignment of pointer.

Source §

type Init = T

The type for initializers.

Source §

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

Source §

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

Source §

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

Source §

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

Source §

impl<T> Pointee for T

Source §

type Metadata = ()

The metadata type for pointers and references to this type.

Source §

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source §

impl<T> ToOwned for T
where T: Clone,

Source §

type Owned = T

The resulting type after obtaining ownership.

Source §

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

Source §

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

CpuRuntime

Struct CpuRuntime Copy item path

Trait Implementations§

impl ActivationOps<CpuRuntime> for CpuClient

fn relu(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn sigmoid(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn silu(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn gelu(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn silu_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn gelu_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn relu_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn sigmoid_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn silu_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

fn gelu_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

fn relu_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

fn sigmoid_mul_bwd( &self, grad: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>), Error>

fn leaky_relu( &self, a: &Tensor<CpuRuntime>, negative_slope: f64, ) -> Result<Tensor<CpuRuntime>, Error>

fn elu( &self, a: &Tensor<CpuRuntime>, alpha: f64, ) -> Result<Tensor<CpuRuntime>, Error>

fn softmax( &self, a: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

fn softmax_bwd( &self, grad: &Tensor<CpuRuntime>, output: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

fn softmax_with_bias( &self, a: &Tensor<CpuRuntime>, bias: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

fn softplus(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn log_softmax( &self, a: &Tensor<CpuRuntime>, dim: isize, ) -> Result<Tensor<CpuRuntime>, Error>

fn dropout( &self, a: &Tensor<CpuRuntime>, p: f64, training: bool, ) -> Result<Tensor<CpuRuntime>, Error>

impl AdvancedRandomOps<CpuRuntime> for CpuClient

fn philox_randn( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn philox_uniform( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn threefry_randn( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn threefry_uniform( &self, shape: &[usize], key: u64, counter: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn pcg64_randn( &self, shape: &[usize], seed: u64, stream: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn pcg64_uniform( &self, shape: &[usize], seed: u64, stream: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn xoshiro256_randn( &self, shape: &[usize], seed: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

fn xoshiro256_uniform( &self, shape: &[usize], seed: u64, dtype: DType, ) -> Result<Tensor<CpuRuntime>, Error>

impl AlibiOps<CpuRuntime> for CpuClient

fn alibi_add_bias( &self, scores: &Tensor<CpuRuntime>, batch_size: usize, num_heads: usize, seq_len_q: usize, seq_len_k: usize, ) -> Result<()>

fn alibi_add_bias_causal( &self, scores: &Tensor<CpuRuntime>, batch_size: usize, num_heads: usize, seq_len_q: usize, seq_len_k: usize, position: usize, ) -> Result<()>

impl AttentionOps<CpuRuntime> for CpuClient

fn multi_head_attention( &self, q: &Var<CpuRuntime>, k: &Var<CpuRuntime>, v: &Var<CpuRuntime>, mask: Option<&Var<CpuRuntime>>, num_heads: usize, ) -> Result<Var<CpuRuntime>>

impl BinaryOps<CpuRuntime> for CpuClient

fn add( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn sub( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn div( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn pow( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn maximum( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn minimum( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn atan2( &self, y: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn fused_mul_add( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, c: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn fused_add_mul( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, c: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn add_into( &self, out: &Tensor<CpuRuntime>, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<(), Error>

impl CalibrationOps<CpuRuntime> for CpuClient

fn awq_channel_scores( &self, activations: &Tensor<CpuRuntime>, weights: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

fn fisher_information( &self, gradients: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

fn gptq_hessian_update( &self, hessian: &Tensor<CpuRuntime>, x_block: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>>

fn gptq_quantize_column( &self, weight: &Tensor<CpuRuntime>, h_inv: &Tensor<CpuRuntime>, num_bits: u32, group_size: u32, symmetric: bool, ) -> Result<(Tensor<CpuRuntime>, Tensor<CpuRuntime>, Tensor<CpuRuntime>)>

impl Clone for CpuRuntime

fn clone(&self) -> CpuRuntime

fn clone_from(&mut self, source: &Self)

impl CompareOps<CpuRuntime> for CpuClient

fn eq( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn ne( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn lt( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn le( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn gt( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn ge( &self, a: &Tensor<CpuRuntime>, b: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

impl ComplexOps<CpuRuntime> for CpuClient

fn conj(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn real(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn imag(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn angle(&self, a: &Tensor<CpuRuntime>) -> Result<Tensor<CpuRuntime>, Error>

fn make_complex( &self, real: &Tensor<CpuRuntime>, imag: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn complex_mul_real( &self, complex: &Tensor<CpuRuntime>, real: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn complex_div_real( &self, complex: &Tensor<CpuRuntime>, real: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

fn real_mul_complex( &self, real: &Tensor<R>, complex: &Tensor<R>, ) -> Result<Tensor<R>, Error>

impl ConditionalOps<CpuRuntime> for CpuClient

fn where_cond( &self, cond: &Tensor<CpuRuntime>, x: &Tensor<CpuRuntime>, y: &Tensor<CpuRuntime>, ) -> Result<Tensor<CpuRuntime>, Error>

impl ConvOps<CpuRuntime> for CpuClient

fn conv1d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: usize, padding: PaddingMode, dilation: usize, groups: usize, ) -> Result<Tensor<CpuRuntime>, Error>

fn conv2d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: (usize, usize), padding: PaddingMode, dilation: (usize, usize), groups: usize, ) -> Result<Tensor<CpuRuntime>, Error>

fn depthwise_conv2d( &self, input: &Tensor<CpuRuntime>, weight: &Tensor<CpuRuntime>, bias: Option<&Tensor<CpuRuntime>>, stride: (usize, usize), padding: PaddingMode, dilation: (usize, usize), ) -> Result<Tensor<CpuRuntime>, Error>

Struct CpuRuntime