Skip to main content

UOp

Struct UOp 

Source
pub struct UOp {
    pub id: u64,
    pub content_hash: u64,
    pub tag: Option<SmallVec<[usize; 2]>>,
    /* private fields */
}
Expand description

Micro-operation node in the computation graph.

UOps form a DAG where operations reference their inputs through the Op enum. Hash consing ensures that structurally identical UOps share the same allocation.

Shape inference is lazy and cached - computed on first access via shape() method.

Note: Debug uses derive_more with #[debug(skip)] on cache fields to prevent stack overflow from recursive Arc references in caches.

Fields§

§id: u64

Unique stable ID for this UOp instance. Used for identity-based caching instead of fragile raw pointers.

§content_hash: u64

Structural content hash — deterministic regardless of allocation order. Computed at creation time: hash(op_discriminant, dtype, op_data, children_content_hashes). O(1) per node since children are already created with their content_hash set. Used for schedule-level caching where UOp IDs are not stable across runs.

§tag: Option<SmallVec<[usize; 2]>>

Tag for tracking tensor identity through the rangeify pipeline.

Matches Tinygrad’s UOp.tag (ops.py:128). Tags are tuples of integer indices that track which original tensor UOps map to which final kernel outputs. Tags participate in hash consing — different tag = different UOp.

Values:

  • None — untagged (default)
  • Some([]) — empty tag (e.g., RANGE ops)
  • Some([i]) — single index (assigned by add_tags)
  • Some([i, j, ...]) — merged indices (from buffer folding)

Implementations§

Source§

impl UOp

Source

pub fn try_add(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_sub(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_mul(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_mod(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_div(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Division with automatic type-based operator selection.

Uses Idiv for integer types and Fdiv for float types.

Source

pub fn try_max(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Maximum of two values: max(a, b).

Source

pub fn try_pow(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Power: a^b.

Source

pub fn neg(self: &Arc<UOp>) -> Arc<UOp>

Negation: -x.

Produces MUL(x, -1) instead of Unary(Neg, x), matching Tinygrad’s approach. Unary(Neg) is reintroduced late in codegen decompositions (pm_neg_from_mul) AFTER pm_lower_index_dtype has resolved all Invalid nodes. This ensures propagate_invalid (which only handles Binary ops) can push WHERE+Invalid through negation.

If self has a shape, broadcasts -1 to match (RESHAPE+EXPAND), matching Tinygrad’s Tensor-level _broadcasted(). If shapeless (schedule/symbolic context), uses a scalar const directly.

Source

pub fn abs(self: &Arc<UOp>) -> Arc<UOp>

Absolute value: |x|.

Source

pub fn square(self: &Arc<UOp>) -> Arc<UOp>

Square: x².

Source

pub fn sign(self: &Arc<UOp>) -> Arc<UOp>

Sign: -1 for negative, 0 for zero, 1 for positive.

Source

pub fn try_add_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>
where T: IntoUOp,

Source

pub fn try_sub_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>
where T: IntoUOp,

Source

pub fn try_mul_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>
where T: IntoUOp,

Source

pub fn try_mod_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>
where T: IntoUOp,

Source

pub fn try_sqrt(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_rsqrt(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_exp(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_exp2(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_log(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_log2(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_sin(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_cos(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_tan(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn erf(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Error function: erf(x) - requires float dtype.

Source

pub fn try_reciprocal(operand: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Reciprocal: 1/x - requires float dtype.

Source

pub fn trunc(operand: Arc<UOp>) -> Arc<UOp>

Truncate towards zero.

Source

pub fn floor(operand: Arc<UOp>) -> Arc<UOp>

Floor: round towards -∞.

Source

pub fn ceil(operand: Arc<UOp>) -> Arc<UOp>

Ceiling: round towards +∞.

Source

pub fn round(operand: Arc<UOp>) -> Arc<UOp>

Round: round to nearest integer (half to even).

Source

pub fn try_and_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_or_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_xor_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_shl_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_shr_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn not(self: &Arc<UOp>) -> Arc<UOp>

Logical not: !x.

Source

pub fn try_cmplt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_cmple(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_cmpeq(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_cmpne(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_cmpgt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_cmpge(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>

Source

pub fn try_where( condition: Arc<UOp>, true_val: Arc<UOp>, false_val: Arc<UOp>, ) -> Result<Arc<UOp>, Error>

Conditional selection: condition ? true_val : false_val.

§Errors
  • WhereConditionNotBool if condition dtype is not bool
Source

pub fn try_mulacc( a: Arc<UOp>, b: Arc<UOp>, c: Arc<UOp>, ) -> Result<Arc<UOp>, Error>

Multiply-accumulate: a * b + c (fused operation).

All operands must have matching dtypes (including vcount) for valid codegen. Returns None if vcounts don’t match - caller should fall back to Add(Mul(a,b), c).

Source

pub fn add(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_add.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn sub(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_sub.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn mul(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_mul.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn idiv(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_div.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn mod_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_mod.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn max(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_max.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn and_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_and_op.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn or_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_or_op.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn xor(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_xor_op.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn shl(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_shl_op.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn shr(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_shr_op.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn lt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_cmplt.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn le(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_cmple.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn gt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_cmpgt.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn ge(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_cmpge.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn eq(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_cmpeq.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn ne(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>

Panicking version of try_cmpne.

For use in pattern rewrites where types are validated. Panics on type mismatch.

Source

pub fn alu(op: BinaryOp, lhs: Arc<UOp>, rhs: Arc<UOp>) -> Arc<UOp>

Low-level binary op constructor that auto-selects result dtype.

Comparisons produce Bool; everything else inherits lhs dtype. Matches Tinygrad’s UOp.alu(). No type promotion or validation — use only in rewrites where types are already correct.

Source

pub fn threefry(lhs: Arc<UOp>, rhs: Arc<UOp>) -> Result<Arc<UOp>, Error>

Threefry PRNG: threefry(x, key).

Source§

impl UOp

Source

pub fn range_axis( end: Arc<UOp>, axis_id: AxisId, axis_type: AxisType, ) -> Arc<UOp>

Create a Range operation with specified axis type.

Source

pub fn range(end: Arc<UOp>, axis_id: usize) -> Arc<UOp>

Create a RANGE operation with Loop axis type (convenience for tests).

Uses AxisId::Renumbered since tests typically work with renumbered kernels.

Source

pub fn range_const(end_value: i64, axis_id: usize) -> Arc<UOp>

Create a RANGE operation with constant end value (convenience for tests).

Uses AxisId::Renumbered since tests typically work with renumbered kernels. Creates a Loop range (inside kernels).

Source

pub fn range_outer_const(end_value: i64, axis_id: usize) -> Arc<UOp>

Create an OUTER RANGE operation with constant end value (convenience for tests).

Uses AxisId::Renumbered since tests typically work with renumbered kernels. Creates an Outer range (wraps entire kernels).

Source

pub fn if_(condition: Arc<UOp>, body: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>

Create a conditional block that executes body when condition is true.

Body contains operations to execute; use endif to close the block.

Source

pub fn endif(if_op: Arc<UOp>) -> Arc<UOp>

End if block.

Source

pub fn end(self: &Arc<UOp>, ranges: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>

End of range or reduce scope.

Wraps self (the computation) and closes the specified ranges. This marks the end of RANGE or REDUCE loops.

§Arguments
  • ranges - The RANGE or REDUCE operations being closed
Source

pub fn barrier(self: &Arc<UOp>, deps: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>

Insert a synchronization barrier.

Self passes through; deps are operations that must complete before any consumer of this barrier executes.

Source

pub fn var( name: impl Into<String>, dtype: DType, min_val: i64, max_val: i64, ) -> Arc<UOp>

Create a DefineVar operation for range-bounded variables.

Used in testing and symbolic analysis to define variables with known ranges. Range is [min_val, max_val] inclusive.

Source

pub fn define_var(name: String, min_val: i64, max_val: i64) -> Arc<UOp>

Define a symbolic variable with known bounds for range analysis.

Range is [min_val, max_val] inclusive.

Source

pub fn bind(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>

Bind concrete value to symbolic variable.

Source

pub fn special(end: Arc<UOp>, name: String) -> Arc<UOp>

Create a GPU-specific dimension variable (e.g., blockIdx.x, threadIdx.y).

Unlike RANGE which is a loop, SPECIAL represents hardware-provided indices. The name identifies the dimension (rendered as-is in codegen).

Source§

impl UOp

Source

pub fn const_(dtype: DType, value: ConstValue) -> Arc<UOp>

Create a constant UOp with explicit dtype and value.

Normalizes the value to match the target dtype (e.g., Float(5.0) becomes Int(5) when dtype is Int32). This prevents codegen from emitting mismatched literals.

Use native_const for type-inferred constants from Rust values.

Source

pub fn native_const<T>(value: T) -> Arc<UOp>
where T: HasDType + IntoUOp,

Create a constant UOp from a Rust native value with automatic dtype inference.

Source

pub fn index_const(value: i64) -> Arc<UOp>

Create an index constant.

Source

pub fn const_like<T>(self: &Arc<UOp>, value: T) -> Arc<UOp>
where T: IntoUOp,

Create a constant with the same dtype as self.

This is the Rust equivalent of Tinygrad’s x.const_like(value). Useful for creating identity elements, zeros, or other constants that match an existing UOp’s type.

§Examples
let x = UOp::const_(DType::Float32, morok_ir::ConstValue::Float(5.0));
let zero = x.const_like(0.0);
assert_eq!(zero.dtype(), DType::Float32);
Source

pub fn vconst(values: Vec<ConstValue>, scalar_dtype: DType) -> Arc<UOp>

Create a vector constant from multiple values.

Dtype is inferred from the first value; all values must be same type.

Source

pub fn buffer_id(num: Option<usize>) -> Arc<UOp>

Create a unique buffer identifier.

Source

pub fn new_buffer(device: DeviceSpec, size: usize, dtype: DType) -> Arc<UOp>

Create a new buffer.

Equivalent to: UOp(Ops.BUFFER, dtype, (unique(), device(device_spec)), size)

Source

pub fn param( slot: usize, size: usize, dtype: DType, device: Option<Arc<UOp>>, ) -> Arc<UOp>

Create a normalized buffer parameter with positional slot. Used by pre-schedule normalization (BUFFER→PARAM) to erase buffer identity. Matches Tinygrad’s UOp.param(slot, dtype, shape, device) (ops.py:817-819).

Source

pub fn view(self: &Arc<UOp>, size: usize, offset: usize) -> Arc<UOp>

Create a buffer view.

Source

pub fn device(device: DeviceSpec) -> Arc<UOp>

Create a device specification.

Source

pub fn noop() -> Arc<UOp>

Create a no-op.

Source

pub fn cast(self: &Arc<UOp>, dtype: DType) -> Arc<UOp>

Cast to a different dtype.

If casting a vector to a scalar type, automatically promotes the target dtype to a matching vector type. This prevents invalid scalar-to-vector casts in the IR. (Matches Tinygrad’s cast behavior.)

Source

pub fn bitcast(self: &Arc<UOp>, dtype: DType) -> Arc<UOp>

Bitcast: reinterpret bits as different type.

Source§

impl UOp

Source

pub fn sink(sources: Vec<Arc<UOp>>) -> Arc<UOp>

Create a sink operation (graph termination).

Sink marks outputs that must be evaluated. All sources are dependencies.

Source

pub fn group(sources: Vec<Arc<UOp>>) -> Arc<UOp>

Create a group operation (merging/organizing related ops).

Group is a NOOP that helps organize related operations together. It passes through the first source while ensuring all sources are dependencies.

Source

pub fn assign(target: Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>

In-place assignment.

§Arguments
  • target - The INDEX operation for the assignment destination
  • value - The value to assign
Source

pub fn assign_with_mops( target: Arc<UOp>, value: Arc<UOp>, movement_ops: Option<Arc<UOp>>, ) -> Arc<UOp>

In-place assignment with movement ops chain.

The movement_ops parameter captures shape transformations from the original target, used during bufferize_to_store to apply the same transformations to the result buffer.

Source

pub fn after(self: &Arc<UOp>, deps: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>

Ordering constraint: self depends on deps.

§Arguments
  • deps - Dependencies that must complete before this value is used
§Panics (debug only)

Panics if self is a control flow node (Range, End)

Source

pub fn detach(self: &Arc<UOp>) -> Arc<UOp>

Detach from gradient flow / force materialization.

Source

pub fn contiguous(self: &Arc<UOp>) -> Arc<UOp>

Ensure contiguous memory layout.

Elides the CONTIGUOUS wrapper when the source is already contiguous:

  • Already a CONTIGUOUS node (no double wrapping)
  • Has buffer identity (BUFFER, or RESHAPE/MULTI chain to BUFFER)

Based on Tinygrad’s UOp.contiguous() (ops.py:463-466).

Source

pub fn contiguous_with_opts( self: &Arc<UOp>, opts: SmallVec<[ContiguousHint; 4]>, ) -> Arc<UOp>

Ensure contiguous memory layout with optimization hints.

The hints are extracted during rangeify and passed to the optimizer. Based on Tinygrad’s CONTIGUOUS.arg which carries Opt tuples.

Source

pub fn contiguous_backward(self: &Arc<UOp>) -> Arc<UOp>

Contiguous backward pass.

Source

pub fn precast(self: &Arc<UOp>) -> Arc<UOp>

Optimizer hint to force materialization before type conversion.

Inserted before BITCAST to ensure the source is rendered separately in codegen (prevents invalid cast fusion).

Source

pub fn custom( deps: SmallVec<[Arc<UOp>; 4]>, code: String, dtype: DType, ) -> Arc<UOp>

Inject custom code as a statement in the generated kernel.

deps are UOps whose rendered names can be referenced in code. dtype specifies the result type (often Void for statements).

Source

pub fn customi( deps: SmallVec<[Arc<UOp>; 4]>, code: String, dtype: DType, ) -> Arc<UOp>

Inject custom code as an inline expression.

Unlike custom (statement), this is substituted directly into expressions. deps provide values to reference; result has specified dtype.

Source§

impl UOp

Source

pub fn wmma( a: Arc<UOp>, b: Arc<UOp>, c: Arc<UOp>, metadata: WmmaMetadata, ) -> Arc<UOp>

Warp Matrix Multiply-Accumulate for tensor cores.

Computes D = A × B + C using hardware matrix units. metadata specifies dimensions, dtypes, and upcast axes for vectorization.

Source

pub fn try_vectorize( elements: SmallVec<[Arc<UOp>; 4]>, ) -> Result<Arc<UOp>, Error>

Create vector from scalar elements (fallible version with validation).

§Errors
  • VectorizeRequiresMultiple if elements is empty
  • VectorizeDTypeMismatch if elements have different scalar dtypes
Source

pub fn vectorize(elements: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>

Create vector from scalar elements (panics on violation).

Source

pub fn try_broadcast(self: &Arc<UOp>, count: usize) -> Result<Arc<UOp>, Error>

Broadcast a scalar value to a vector by replication (fallible version).

Creates a VECTORIZE operation with count copies of the source. If count == 1, returns the source unchanged.

§Errors
  • BroadcastRequiresScalar if source has vcount > 1
Source

pub fn broadcast(self: &Arc<UOp>, count: usize) -> Arc<UOp>

Broadcast a scalar value to a vector by replication.

Creates a VECTORIZE operation with count copies of the source. If count == 1, returns the source unchanged.

§Example
let vector = scalar.broadcast(4);
Source

pub fn try_gep(self: &Arc<UOp>, indices: Vec<usize>) -> Result<Arc<UOp>, Error>

Extract element(s) from vector (fallible version with validation).

§Errors
  • GepRequiresVector if source has vcount <= 1
  • GepIndexOutOfBounds if any index >= source vcount
Source

pub fn gep(self: &Arc<UOp>, indices: Vec<usize>) -> Arc<UOp>

Extract element(s) from vector (Get Element Pointer).

§Example
let elem = vector.gep(vec![0]);      // Extract single element
let sub = vector.gep(vec![0, 2]);    // Extract multiple elements
Source

pub fn try_contract( self: &Arc<UOp>, upcast_ranges: Vec<(usize, usize)>, ) -> Result<Arc<UOp>, Error>

Contract unrolled values back into vectorized form (fallible version).

§Errors
  • ContractCountMismatch if dtype.vcount != product of axis sizes
Source

pub fn contract(self: &Arc<UOp>, upcast_ranges: Vec<(usize, usize)>) -> Arc<UOp>

Contract unrolled values back into vectorized form.

Pairs with UNROLL: UNROLL expands loops for optimization, CONTRACT combines the results. Used in WMMA and vectorization passes.

Source

pub fn try_unroll( self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>, ) -> Result<Arc<UOp>, Error>

Expand a value across unrolled loop iterations (fallible version).

§Errors
  • UnrollCountMismatch if src.dtype.vcount != product of axis sizes
Source

pub fn unroll(self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>) -> Arc<UOp>

Expand a value across unrolled loop iterations.

Creates multiple versions of the computation for each unroll axis. Pairs with CONTRACT which combines results back together.

Source

pub fn unroll_with_dtype( self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>, dtype: DType, ) -> Arc<UOp>

Create UNROLL with explicit dtype (for do_contract pattern).

Used when UNROLL dtype should differ from source dtype, specifically when CONTRACT collapses UNROLL via GEP and we need to preserve the per-iteration element type.

Based on Tinygrad’s pattern where partial contraction creates UNROLL with remaining axes but CONTRACT’s dtype.

Source

pub fn mstack(buffers: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>

Stack multiple buffers (multi-device tensors).

MStack combines buffers from multiple devices into a single logical tensor. Used for distributed/multi-GPU tensor operations.

Source

pub fn mselect(self: &Arc<UOp>, device_index: usize) -> Arc<UOp>

Select buffer by device index (multi-device access).

MSelect retrieves a specific device’s buffer from a multi-device tensor.

Source

pub fn kernel(sources: SmallVec<[Arc<UOp>; 4]>, ast: Arc<UOp>) -> Arc<UOp>

Kernel wrapper.

Creates a KERNEL operation with the given sources (kernel arguments) and AST (computation).

§Arguments
  • sources - Kernel arguments (buffers and variables)
  • ast - The computation graph (usually SINK, COPY, or BUFFER_VIEW)
Source

pub fn cat() -> UOpCatBuilder

Create a CAT operation (concatenate vectors).

§Example
// Infer dtype (sum of vcounts)
UOp::cat().sources(vec![a, b]).call()

// Explicit dtype
UOp::cat().sources(vec![a, b]).dtype(vec8_dtype).call()
Source

pub fn ptrcat() -> UOpPtrcatBuilder

Create a PTRCAT operation (concatenate pointers).

§Example
UOp::ptrcat().sources(vec![a, b]).dtype(ptr_dtype).call()
Source§

impl UOp

Source

pub fn pointer_index( self: &Arc<UOp>, offset: Arc<UOp>, ) -> Result<Arc<UOp>, Error>

Create a pointer index operation (pointer arithmetic).

Performs pointer + offset arithmetic for address calculation in kernels. Both self (ptr) and offset should have Index dtype.

Source

pub fn slice(buffer: Arc<UOp>, specs: Vec<IndexSpec>) -> Result<Arc<UOp>, Error>

Multi-dimensional slicing with IndexSpec.

Note: Range and NewAxis specs are not fully implemented; currently only Single indices are properly supported.

Source

pub fn slice_gated( buffer: Arc<UOp>, specs: Vec<IndexSpec>, gate: Arc<UOp>, ) -> Result<Arc<UOp>, Error>

Gated slicing - conditional access with gate.

Source

pub fn valid(self: &Arc<UOp>, cond: Arc<UOp>) -> Arc<UOp>

Wrap index with validity condition.

This is the Rust equivalent of Tinygrad’s idx.valid(cond). Creates WHERE(cond, self, Invalid) to mark conditional index validity.

§Examples
// Create a conditionally valid index
let valid_idx = idx.valid(cond);
// Equivalent to: WHERE(cond, idx, INVALID)
Source

pub fn store(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>

Create a STORE operation without ranges.

Stores a value at self (INDEX location). The buffer is accessed indirectly through the INDEX node. For stores with ranges (e.g., output upcasting), use store_with_ranges.

For gated stores, use an INDEX with a gate (INDEX has optional gate field).

Source

pub fn store_with_ranges( self: &Arc<UOp>, value: Arc<UOp>, ranges: SmallVec<[Arc<UOp>; 4]>, ) -> Arc<UOp>

Create a STORE operation with ranges.

Stores a value at self (INDEX location), with explicit ranges that define the scope of the store operation. This matches Tinygrad’s architecture where STORE sources are (index, value, *ranges).

Ranges are used for output upcasting: Range(Upcast) becomes UNROLL during expansion, which fix_store_unroll contracts via CONTRACT.

For gated stores, use an INDEX with a gate (INDEX has optional gate field).

Source

pub fn copy_to_device(self: &Arc<UOp>, device: DeviceSpec) -> Arc<UOp>

Copy to a different device.

Source

pub fn copy(self: &Arc<UOp>, device: Arc<UOp>) -> Arc<UOp>

Create a COPY operation with explicit device UOp.

Unlike copy_to_device which takes a DeviceSpec, this takes a device UOp directly (useful when you already have one).

Source

pub fn bufferize( compute: Arc<UOp>, ranges: Vec<Arc<UOp>>, opts: BufferizeOpts, ) -> Arc<UOp>

Create a BUFFERIZE operation.

Marks a computation to be materialized into a buffer. The computation is evaluated over the given ranges and stored.

Source

pub fn bufferize_global(compute: Arc<UOp>, ranges: Vec<Arc<UOp>>) -> Arc<UOp>

Create a BUFFERIZE operation with Global address space.

This is the most common pattern - bufferize to global memory.

Source

pub fn bufferize_local(compute: Arc<UOp>, ranges: Vec<Arc<UOp>>) -> Arc<UOp>

Create a BUFFERIZE operation with Local address space.

For shared/local memory bufferization.

Source

pub fn define_local(id: usize, dtype: DType) -> Arc<UOp>

Create a DEFINE_LOCAL operation.

Defines a local (shared) memory allocation with the given ID.

Source

pub fn define_reg(size: usize) -> Arc<UOp>

Define register memory (void pointer - type determined by usage).

Source

pub fn define_reg_typed(size: usize, element_dtype: DType) -> Arc<UOp>

Define register memory with explicit element type.

Creates a typed register accumulator for use in reductions. The element_dtype specifies the type of each element (e.g., Float32 for a float accumulator).

Source

pub fn index<I>() -> UOpIndexBuilder<I>
where I: Into<SmallVec<[Arc<UOp>; 4]>>,

Create a buffer index operation for multi-dimensional access.

All indices must have Index dtype.

§Dtype behavior (matches Tinygrad’s buf.index(idx, ptr=False, dtype=None))
  • If dtype is provided: use it directly (explicit dtype takes precedence)
  • If ptr is true: keep the buffer’s Ptr dtype (for STORE targets)
  • Otherwise (ptr=false, default): extract element type from buffer (for LOAD sources)
§Examples
// Element dtype (default) - for LOAD
UOp::index().buffer(buf).indices(vec![idx]).call()?

// Ptr dtype via ptr=true - for STORE (preferred, Tinygrad-aligned)
UOp::index().buffer(buf).indices(vec![idx]).ptr(true).call()?

// Explicit Ptr dtype - for STORE (legacy, works but prefer .ptr(true))
let ptr_dtype = DType::Float32.ptr(Some(size), AddrSpace::Global);
UOp::index().buffer(buf).indices(vec![idx]).dtype(ptr_dtype).call()?

// With gate
UOp::index().buffer(buf).indices(vec![idx]).gate(gate_uop).call()?
Source

pub fn load() -> UOpLoadBuilder

Create a LOAD operation.

§Example
// Infer dtype from buffer
UOp::load().buffer(buf).index(idx).call()

// Explicit dtype for vector loads
UOp::load().buffer(buf).index(idx).dtype(vec4_dtype).call()

// With alt value for gated loads
UOp::load().buffer(buf).index(idx).alt(zero).call()
Source§

impl UOp

Source

pub fn try_reduce_axis( self: &Arc<UOp>, reduce_op: ReduceOp, axes: Vec<usize>, ) -> Result<Arc<UOp>, Error>

Reduce along specified axes using reduce_op.

Implements Tinygrad’s early-return pattern: when all axes are reduced or when all reduction axes have dimension 1, returns self instead of creating a ReduceAxis operation.

§Errors

Returns error if any axis is >= number of dimensions.

Source

pub fn reduce( self: &Arc<UOp>, ranges: SmallVec<[Arc<UOp>; 4]>, reduce_op: ReduceOp, ) -> Arc<UOp>

Reduce across loop ranges using reduce_op.

Unlike try_reduce_axis (operates on tensor axes), this reduces values accumulated across RANGE loop iterations.

Source

pub fn allreduce( src: Arc<UOp>, device: Arc<UOp>, reduce_op: ReduceOp, ) -> Arc<UOp>

All-reduce across multiple devices.

Source§

impl UOp

Source

pub fn try_reshape( self: &Arc<UOp>, new_shape: &SmallVec<[SInt; 4]>, ) -> Result<Arc<UOp>, Error>

Reshape with strict validation (fail-fast).

Validates:

  • No negative dimensions in new_shape
  • Product of input shape == product of output shape
Source

pub fn try_expand( self: &Arc<UOp>, new_shape: &SmallVec<[SInt; 4]>, ) -> Result<Arc<UOp>, Error>

Expand (broadcast) with strict validation.

Validates:

  • Number of dimensions matches
  • Each dimension either matches or src dimension is 1
Source

pub fn try_permute(self: &Arc<UOp>, axes: Vec<usize>) -> Result<Arc<UOp>, Error>

Permute with strict validation.

Validates:

  • Permutation is valid (contains each index 0..n exactly once)
Source

pub fn try_pad( self: &Arc<UOp>, padding: &[(SInt, SInt)], ) -> Result<Arc<UOp>, Error>

Pad with strict validation.

Validates:

  • Padding values are concrete (not symbolic)
  • Number of padding pairs matches dimensions
Source

pub fn try_shrink( self: &Arc<UOp>, ranges: &[(SInt, SInt)], ) -> Result<Arc<UOp>, Error>

Shrink (slice) with strict validation.

Validates:

  • Range values are concrete (not symbolic)
  • begin <= end for each dimension
  • 0 <= begin, end <= dimension_size
Source

pub fn try_flip(self: &Arc<UOp>, axes: Vec<bool>) -> Result<Arc<UOp>, Error>

Flip with strict validation.

Validates:

  • Flip specification length matches shape dimensions
Source

pub fn multi(src: Arc<UOp>, axis: usize) -> Arc<UOp>

Split tensor across multiple devices along specified axis.

Creates a multi-device tensor where each device holds a shard. Use with MSTACK/MSELECT for distributed tensor operations.

Source§

impl UOp

Source

pub fn op(&self) -> &Op

Get the operation.

Source

pub fn dtype(&self) -> DType

Get the data type.

Source

pub fn tag(&self) -> &Option<SmallVec<[usize; 2]>>

Get the tag.

Source

pub fn rtag(self: &Arc<UOp>, tag: Option<SmallVec<[usize; 2]>>) -> Arc<UOp>

Create a new UOp with the given tag (Tinygrad: rtag()). Returns self unchanged if tag is already equal.

Source

pub fn with_tag(self: &Arc<UOp>, tag: SmallVec<[usize; 2]>) -> Arc<UOp>

Create a new UOp with the given tag set.

Source

pub fn has_buffer_identity(&self) -> bool

Check if this UOp has a concrete buffer identity in the graph.

Returns true for BUFFER or RESHAPE/MULTI chains leading to BUFFER. These are already contiguous by definition, so wrapping in CONTIGUOUS is a no-op.

Based on Tinygrad’s UOp.has_buffer_identity() (ops.py:616-619).

Source

pub fn ptrdtype(&self) -> Option<(&DType, AddrSpace, Option<usize>)>

Get pointer dtype components if this UOp has a Ptr dtype.

Returns (base, addrspace, size) for Ptr types, None otherwise. This simplifies pattern matching on pointer types.

§Examples
let buffer = UOp::new_buffer(DeviceSpec::Cpu, 10, DType::Float32);
if let Some((base, addrspace, size)) = buffer.ptrdtype() {
    assert_eq!(*base, DType::Float32);
    assert_eq!(addrspace, AddrSpace::Global);
}
Source

pub fn with_dtype(self: &Arc<UOp>, dtype: DType) -> Arc<UOp>

Create a copy of this UOp with a different dtype.

If the dtype is unchanged, returns self (clone of Arc). This is the Rust equivalent of Tinygrad’s buf.replace(dtype=x).

§Examples
let int_const = UOp::const_(DType::Int32, morok_ir::ConstValue::Int(5));
let float_const = int_const.with_dtype(DType::Float32);
assert_eq!(float_const.dtype(), DType::Float32);
Source

pub fn unwrap_after(self: &Arc<UOp>) -> Arc<UOp>

Walk through AFTER nodes to get the passthrough value.

This is the Rust equivalent of Tinygrad’s .or_after() pattern. Recursively unwraps AFTER nodes to find the underlying value.

§Examples
// Given: AFTER(AFTER(value, [dep1]), [dep2])
// Returns: value
let inner = wrapped.unwrap_after();
Source

pub fn unwrap_cast(self: &Arc<UOp>) -> Arc<UOp>

Walk through CAST nodes to get the inner value.

This is the Rust equivalent of Tinygrad’s .or_casted() pattern. Recursively unwraps CAST nodes to find the underlying value.

§Examples
// Given: CAST(CAST(value, dtype1), dtype2)
// Returns: value
let inner = casted.unwrap_cast();
Source

pub fn store_buffer(&self) -> Option<&Arc<UOp>>

Get the buffer from a STORE operation (via its INDEX child).

STORE operations reference the buffer indirectly through an INDEX node. This helper extracts the buffer from STORE.index.buffer.

Returns None if:

  • This is not a STORE operation
  • The STORE’s index is not an INDEX operation
Source

pub fn load_buffer(&self) -> Option<Arc<UOp>>

Get the buffer from a LOAD operation.

Returns None if this is not a LOAD operation.

Source

pub fn store_value(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>

Store a value at this INDEX node.

Convenience method for self.store(value). Matches Tinygrad’s idx.store(val) pattern.

§Panics

Debug-asserts that self is an INDEX operation.

Source

pub fn with_src(self: &Arc<UOp>, new_srcs: Vec<Arc<UOp>>) -> Arc<UOp>

Alias for with_sources().

Creates a new UOp with the same operation type and dtype, but with the provided sources replacing the original ones.

Source

pub fn shape(self: &Arc<UOp>) -> Result<Option<&SmallVec<[SInt; 4]>>, Error>

Get the shape of this UOp.

Shape is computed lazily on first access and cached. Returns Ok(None) if shape cannot be determined (e.g., for control flow ops). Returns Err if there is a shape mismatch error.

§Examples
let scalar = UOp::const_(DType::Float32, ConstValue::Float(1.0));
assert_eq!(scalar.shape().unwrap().as_ref().map(|s| s.len()), Some(0)); // Scalar has empty shape
Source

pub fn vmin(self: &Arc<UOp>) -> &ConstValue

Get the minimum possible value of this UOp.

Returns the minimum value based on range analysis. Computed lazily on first access and cached.

§Examples
let five = UOp::const_(DType::Int32, ConstValue::Int(5));
assert_eq!(five.vmin(), &ConstValue::Int(5));
Source

pub fn vmax(self: &Arc<UOp>) -> &ConstValue

Get the maximum possible value of this UOp.

Returns the maximum value based on range analysis. Computed lazily on first access and cached.

§Examples
let five = UOp::const_(DType::Int32, ConstValue::Int(5));
assert_eq!(five.vmax(), &ConstValue::Int(5));
Source

pub fn device_spec(&self) -> Option<DeviceSpec>

Extract device specification from this UOp graph.

Traverses the graph to find Op::Device nodes following Tinygrad’s _device recursive property (ops.py:585-599):

  • DEVICE: returns the DeviceSpec directly
  • BUFFER: returns device from the device child
  • COPY: returns device from the device child (target device)
  • Otherwise: searches children recursively
§Examples
let buffer = UOp::new_buffer(DeviceSpec::Cpu, 10, DType::Float32);
assert_eq!(buffer.device_spec(), Some(DeviceSpec::Cpu));
Source

pub fn base(self: &Arc<UOp>) -> Arc<UOp>

Get the base UOp by walking through movement operations.

Movement operations (RESHAPE, PERMUTE, EXPAND, etc.) are views that don’t change the underlying data. This method recursively walks through these operations to find the actual buffer or computation that owns the data.

Based on Tinygrad’s base property (ops.py:524-527).

§Examples
let buffer = UOp::new_buffer(DeviceSpec::Cpu, 10, DType::Float32);
let shape = Shape::from_iter([SInt::Const(2), SInt::Const(5)]);
let reshaped = buffer.try_reshape(&shape).unwrap();

// base() walks through RESHAPE to get the original BUFFER
assert!(std::sync::Arc::ptr_eq(&reshaped.base(), &buffer));
Source

pub fn buf_uop(self: &Arc<UOp>) -> Arc<UOp>

Get the underlying buffer UOp, walking through AFTER/MSELECT/MSTACK chains.

Based on Tinygrad’s buf_uop property (ops.py:601-606). This recursively unwraps AFTER chains to find the actual buffer.

§Examples
use morok_ir::UOp;

// AFTER wrapping a buffer
let buffer = UOp::new_buffer(...);
let after = buffer.after(deps);

// buf_uop() walks through AFTER to get the underlying buffer
assert!(Arc::ptr_eq(&after.buf_uop(), &buffer));
Source

pub fn toposort(self: &Arc<UOp>) -> Vec<Arc<UOp>>

Topological sort of the computation graph.

Returns nodes in an order where all dependencies come before their dependents.

Source

pub fn toposort_filtered<F>(self: &Arc<UOp>, gate: F) -> Vec<Arc<UOp>>
where F: Fn(&Arc<UOp>) -> bool,

Topological sort with gate function (filtered toposort).

Only traverses nodes for which gate(node) returns true. Nodes for which gate returns false are excluded from the traversal entirely (along with their ancestors).

This is a key optimization for cached property computation, allowing us to skip nodes that already have a property cached.

§Performance

For a graph with 10,000 nodes where 9,900 already have a cached property:

  • Full toposort: 10,000 nodes visited
  • Filtered toposort: 100 nodes visited
  • Speedup: 100x
§Example
// Only process nodes that don't have shape cached
let uncached = uop.toposort_filtered(|node| {
    node.shape_cache.get().is_none()
});
Source

pub fn any_in_subtree<F>(self: &Arc<UOp>, pred: F) -> bool
where F: Fn(&Arc<UOp>) -> bool,

Check if any node in the backward slice satisfies a predicate.

Early-exit DFS — returns true as soon as a matching node is found, without building the full toposort Vec. Use this instead of toposort().iter().any(pred) when you only need an existential check.

Source

pub fn collect_in_subtree<F>(self: &Arc<UOp>, pred: F) -> Vec<Arc<UOp>>
where F: Fn(&Arc<UOp>) -> bool,

Collect all nodes in the backward slice that match a predicate.

DFS collecting matches — cheaper than toposort().iter().filter(pred).collect() when you don’t need topological ordering.

Source

pub fn node_count(self: &Arc<UOp>) -> usize

Count unique nodes in the DAG rooted at this UOp.

Much cheaper than toposort().len() — no result Vec, no ordering. Uses pointer-based visited set for O(1) identity checks.

Source

pub fn has_index_in_sources(self: &Arc<UOp>) -> bool

O(1) cached check: does this node or any of its sources contain an INDEX op?

Computed lazily and cached. Each node checks itself and its direct sources’ cached values, so the total cost across the graph is O(N).

Source

pub fn tree(self: &Arc<UOp>) -> String

Render this UOp and its sources as a compact ASCII tree.

Shared nodes (appearing multiple times due to hash-consing) are shown as back-references: [id] → (see above)

§Example Output
[42] STORE : Void
├── [10] PARAM(0) : Ptr<Float32> shape=[4]
├── [35] INDEX : Ptr<Float32> shape=[4]
│   ├── [10] → (see above)
│   └── [30] RANGE(0, Reduce) : Index
│       └── [5] CONST(Int(4)) : Index
└── [40] REDUCE(Add) : Float32 shape=[]
    └── [35] → (see above)
Source

pub fn tree_full(self: &Arc<UOp>) -> String

Render this UOp and its sources as a full ASCII tree.

Shared nodes are expanded every time they appear (verbose but complete). Use this when you need to see the full subtree at every occurrence.

Source

pub fn ranges(self: &Arc<UOp>) -> &Vec<Arc<UOp>>

Get all RANGE operations in this UOp’s computation graph.

Lazily computed and cached. Useful for rangeify pass to track loop variables.

Source

pub fn in_scope_ranges(self: &Arc<UOp>) -> &HashSet<UOpKey>

Get the RANGE operations that are in scope at this UOp.

Returns only the ranges that are currently “active” (not yet ended). This is computed by:

  1. Merging ranges from all source operations
  2. Removing ranges that are ended by this operation
  3. Adding self if this is a RANGE operation

Based on Tinygrad’s ranges property (ops.py:318-320) and _ranges recursive property (ops.py:302-315).

§Returns

A HashSet of RANGE UOps that are in scope at this point in the graph. The result is cached for performance.

§Examples
use morok_ir::{UOp, AxisType};

// A simple computation inside a range
let range = UOp::range(end, 0, AxisType::Loop);
let value = UOp::const_(...);
let end_op = value.end(vec![range.clone()]);

// Value has range in scope
assert!(value.in_scope_ranges().contains(&range));

// After END, range is no longer in scope
assert!(!end_op.in_scope_ranges().contains(&range));
Source

pub fn all_in_scope_ranges_are(self: &Arc<UOp>, axis_type: AxisType) -> bool

Check if all in-scope ranges at this UOp have the given AxisType.

Returns true if the in-scope ranges set is empty or all ranges match the specified axis type.

§Use Cases
  • all_in_scope_ranges_are(AxisType::Outer) - Used in split_store to determine if we’re at a kernel boundary
§Examples
use morok_ir::{UOp, AxisType};

// At kernel boundary: only OUTER ranges in scope
assert!(uop.all_in_scope_ranges_are(AxisType::Outer));

// Inside kernel: has non-OUTER ranges
assert!(!uop.all_in_scope_ranges_are(AxisType::Outer));
Source

pub fn has_non_outer_ranges(self: &Arc<UOp>) -> bool

Check if any in-scope range is NOT of the given AxisType.

Inverse of all_in_scope_ranges_are. Useful for Tinygrad-style filtering: “skip if any range is not OUTER”.

§Examples
use morok_ir::{UOp, AxisType};

// Has non-OUTER ranges: should skip in split_store
if uop.has_non_outer_ranges() {
    return None;  // Don't split here
}
Source

pub fn get_consumer_map(self: &Arc<UOp>) -> HashMap<UOpKey, Vec<Arc<UOp>>>

Build a consumer map for this UOp’s computation graph.

Returns a HashMap where each UOp maps to the list of UOps that consume it. Useful for reverse traversal and dependency analysis.

Source

pub fn reverse_toposort( self: &Arc<UOp>, consumer_map: &HashMap<UOpKey, Vec<Arc<UOp>>>, ) -> Vec<Arc<UOp>>

Reverse topological sort of the computation graph.

Returns nodes in bottom-up order (leaves first, root last). Requires a consumer map to traverse from leaves to roots.

Source

pub fn substitute(self: &Arc<UOp>, map: &HashMap<UOpKey, Arc<UOp>>) -> Arc<UOp>

Replace UOps in the computation graph according to a substitution map.

Delegates to graph_rewrite_bottom_up with a wildcard pattern that looks up each node in the map — exactly like Tinygrad’s substitute. The rewrite engine provides O(n) memoization via its result cache.

Source

pub fn substitute_gated( self: &Arc<UOp>, map: &HashMap<UOpKey, Arc<UOp>>, ) -> Arc<UOp>

Replace UOps with range-gated substitution (Tinygrad: extra_pm=pm_gate_substitute).

Like substitute, but skips subtrees whose in_scope_ranges() don’t contain any of the substitution keys. This prevents substituting ranges in subexpressions that don’t reference them, matching Tinygrad’s gate_substitute behavior.

Source

pub fn with_sources(self: &Arc<UOp>, new_srcs: Vec<Arc<UOp>>) -> Arc<UOp>

Reconstruct this UOp with new sources.

Creates a new UOp with the same operation type and dtype, but with the provided sources replacing the original ones. Hash consing ensures that if an identical UOp already exists, it will be reused.

This is used by the graph rewrite engine when sources have been rewritten.

§Panics

Panics if the number of sources doesn’t match the operation’s arity.

§Examples
// Original: a + b
let add = UOp::add(a.clone(), b.clone());

// Rewrite sources: a' + b'
let new_add = add.with_sources(vec![a_prime, b_prime]);
Source§

impl UOp

Source

pub fn replace<'f1>(self: &'f1 Arc<UOp>) -> UOpReplaceBuilder<'f1>

Create a modified copy with optional field overrides.

Enables concise pattern implementations by allowing selective field modification. Returns self.clone() if nothing changed (optimization for hash consing).

§Examples
let new_load = load.replace().dtype(new_dtype).src(new_sources).call();
let dtype_only = load.replace().dtype(new_dtype).call();
Source§

impl UOp

Source

pub fn new(op: Op, dtype: DType) -> Arc<UOp>

Create a new UOp with hash consing.

If an identical UOp already exists (in any thread) and is still alive, returns a reference to it. Otherwise, creates a new UOp and caches it.

§Thread Safety

This function is thread-safe. Creating the same UOp from different threads will return the same Arc<UOp>, so Arc::ptr_eq works across threads.

§Memory Management

The cache stores weak references. UOps are automatically cleaned up when no strong references remain (Tinygrad-aligned behavior).

Source

pub fn new_tagged( op: Op, dtype: DType, tag: Option<SmallVec<[usize; 2]>>, ) -> Arc<UOp>

Create a UOp with an explicit tag (Tinygrad: UOp(op, dtype, src, arg, tag)). Tag participates in hash consing — same structure + different tag = different UOp.

Source

pub fn with_metadata<T>(self: &Arc<UOp>, metadata: T) -> Arc<UOp>
where T: Any + Send + Sync + 'static,

Attach metadata to this UOp, creating a new instance.

Metadata is NOT part of hash consing - this method creates a new UOp with a different ID but the same operation structure. This allows attaching metadata (like kernel info) after optimization.

§Examples
let ast = /* ... optimized AST ... */;
let with_info = ast.with_metadata(KernelInfo::new("r_g16l16", vec![], false));
Source

pub fn metadata<T>(&self) -> Option<Arc<T>>
where T: Any + Send + Sync,

Get metadata of a specific type if it exists.

Returns None if no metadata is attached or if the metadata is of a different type.

§Examples
if let Some(info) = ast.metadata::<KernelInfo>() {
    println!("Kernel name: {}", info.name);
}
Source

pub fn metadata_raw(&self) -> Option<Arc<dyn Any + Send + Sync>>

Get raw metadata (type-erased).

Used to preserve metadata across graph rewrites that create new root nodes.

Source

pub fn with_metadata_raw( self: &Arc<UOp>, metadata: Arc<dyn Any + Send + Sync>, ) -> Arc<UOp>

Attach raw metadata (type-erased), creating a new instance.

Used to re-attach metadata that was saved before graph rewrites.

Source§

impl UOp

Source

pub fn const_factor(&self) -> i64

Returns the largest known integer that divides this UOp.

Based on Tinygrad’s const_factor() (ops.py:693-700). For MUL, only checks immediate CONST children (not recursive).

Source

pub fn divides(self: &Arc<UOp>, v: &Arc<UOp>) -> Option<Arc<UOp>>

Returns self / v if v divides self exactly, otherwise None.

Based on Tinygrad’s divides() (ops.py lines 703-711). Delegates to [divides_int] for constant divisors.

Source

pub fn divides_int(self: &Arc<UOp>, v: i64) -> Option<Arc<UOp>>

Returns self / v if integer v divides all terms exactly, otherwise None.

Based on Tinygrad’s divides(v: int) (ops.py:701-709). Recursively handles Const, Add, and Mul operations.

Source

pub fn divide_exact(self: &Arc<UOp>, v: &Arc<UOp>) -> Option<Arc<UOp>>

Returns self / v if exact division by UOp v is possible.

Based on Tinygrad’s divide_exact(v: UOp) (ops.py:717-726). Handles identity, constant divisors, Add recursion, and Mul factoring.

Source

pub fn symbolic_gcd(uops: &[Arc<UOp>]) -> Arc<UOp>

Computes the symbolic GCD of multiple UOps, returning a UOp.

Based on Tinygrad’s UOp.gcd() (ops.py:713-716). Finds both numeric GCD of const_factors AND common symbolic MUL factors.

For inputs 6*a*b and 4*a*c, returns 2*a (numeric GCD=2, common factor=a).

Source

pub fn pop_const( self: &Arc<UOp>, op: BinaryOp, ) -> (Arc<UOp>, Option<ConstValue>)

Separates a constant term from a binary expression.

Returns (non_const_part, const_value). Based on Tinygrad’s pop_const() (ops.py lines 712-713).

§Examples
// (x + 5).pop_const(ADD) = (x, Some(Int(5)))
// (x + y).pop_const(ADD) = (x + y, None)
// x.pop_const(ADD) = (x, None)
Source

pub fn split_uop(self: &Arc<UOp>, sep: BinaryOp) -> Vec<Arc<UOp>>

Splits an associative operation chain into its individual terms.

Based on Tinygrad’s split_uop() (ops.py lines 464-467).

§Examples
// (x + y + z).split_uop(ADD) = [x, y, z]
// (x + y).split_uop(ADD) = [x, y]
// x.split_uop(ADD) = [x]
Source

pub fn backward_slice_ids(self: &Arc<UOp>) -> &HashSet<u64>

Cached backward slice: set of all node IDs reachable from this UOp.

O(1) membership test via contains(). Computed once and cached per-node. Prefer this over backward_slice() when you only need to check if a node is in the dependency set.

Source

pub fn backward_slice(self: &Arc<UOp>) -> Vec<Arc<UOp>>

Returns all nodes that this UOp depends on (backward slice / dependency set).

For membership tests, prefer [backward_slice_ids()] which returns a cached HashSet<u64> with O(1) lookup.

Source

pub fn divisible_by(self: &Arc<UOp>, amount: usize) -> Option<usize>

Check if this UOp’s size is divisible by the given amount.

Returns Some(quotient) if divisible, None otherwise. This is a convenience method for the optimizer to validate transformations.

§Examples
let range = UOp::range(SInt::Const(16), 0, AxisType::Loop);
assert_eq!(range.divisible_by(4), Some(4)); // 16 / 4 = 4
assert_eq!(range.divisible_by(5), None);    // 16 not divisible by 5
Source

pub fn with_axis_type(self: &Arc<UOp>, new_type: AxisType) -> Arc<UOp>

Create a new RANGE UOp with a different axis type.

This is a convenience method for the optimizer to convert ranges between axis types (e.g., LOOP → GLOBAL for parallelization).

§Panics

Panics if called on a non-RANGE operation.

§Examples
let loop_range = UOp::range_axis(UOp::index_const(16), 0, AxisType::Loop);
let global_range = loop_range.with_axis_type(AxisType::Global);
// global_range has same size and axis_id, but different axis type
Source

pub fn get_idx(self: &Arc<UOp>) -> Arc<UOp>

Extract the actual index from a range, stripping validity checks.

If the range is a WHERE(valid, idx, invalid_marker), returns idx. Otherwise, returns the range itself.

This is used for range merging when comparing indexing patterns across multiple consumers.

Based on Tinygrad’s get_idx() (ops.py:438-439).

§Examples
// Range with padding: WHERE(i < 5, i, SENTINEL)
let padded_range = UOp::where_op(valid, idx.clone(), invalid_marker)?;
assert!(Arc::ptr_eq(&padded_range.get_idx(), &idx));

// Plain range: returns itself
let plain_range = UOp::range_axis(...);
assert!(Arc::ptr_eq(&plain_range.get_idx(), &plain_range));
Source

pub fn get_valid(self: &Arc<UOp>) -> Arc<UOp>

Extract the validity mask from a range.

If the range is a WHERE(valid, idx, invalid_marker), returns valid. Otherwise, returns constant true (always valid).

This is used for range merging to combine validity conditions when multiple consumers share compatible indexing patterns.

Based on Tinygrad’s get_valid() (ops.py:440-441).

§Examples
// Range with padding: WHERE(i < 5, i, SENTINEL)
let padded_range = UOp::where_op(valid.clone(), idx, invalid_marker)?;
assert!(Arc::ptr_eq(&padded_range.get_valid(), &valid));

// Plain range: returns constant true
let plain_range = UOp::range_axis(...);
if let Op::Const(cv) = plain_range.get_valid().op() {
    assert_eq!(cv.0, ConstValue::Bool(true));
}
Source

pub fn is_invalid_marker(uop: &Arc<UOp>) -> bool

Check if a UOp represents an invalid index marker.

Matches both scalar Op::Invalid and vectorized VECTORIZE(Invalid, ..., Invalid) where ALL elements are Invalid. The vectorized form appears after expansion broadcasts scalar Invalid across lanes.

Uses all() semantics (entire vector must be Invalid). This differs from has_invalid() in symbolic patterns which uses any() for guard semantics.

Source

pub fn invalid_marker() -> Arc<UOp>

Create an invalid index marker.

Invalid markers are used with WHERE operations to indicate out-of-bounds or padded regions. The value is undefined and should never be used directly - it exists only to be masked away by validity checks.

§Returns

A UOp representing an invalid index value.

§Examples
// Padding: WHERE(i < actual_size, i, invalid)
let invalid = UOp::invalid_marker();
let padded = UOp::where_op(valid, actual_idx, invalid)?;
Source

pub fn is_increasing(self: &Arc<UOp>) -> bool

Check if this UOp is a monotonically increasing function of its inputs.

Returns true for:

  • Irreducible ops (RANGE, CONST, DEFINE_VAR)
  • ADD of increasing ops
  • MUL/IDIV by non-negative constants

Based on Tinygrad’s is_increasing() (ops.py:689-694).

§Examples
// Constants are increasing
let c = UOp::const_(DType::Int32, ConstValue::Int(5));
assert!(c.is_increasing());

// Range variables are increasing
let range = UOp::range_axis(UOp::index_const(16), 0, AxisType::Loop);
assert!(range.is_increasing());

// x + y is increasing if both x and y are increasing
let sum = range.try_add(&c).unwrap();
assert!(sum.is_increasing());

// x * 2 is increasing if x is increasing
let two = UOp::const_(DType::Index, ConstValue::Int(2));
let scaled = range.try_mul(&two).unwrap();
assert!(scaled.is_increasing());

Trait Implementations§

Source§

impl Clone for UOp

Source§

fn clone(&self) -> UOp

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for UOp

Source§

fn fmt(&self, __derive_more_f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl Hash for UOp

Hash implementation for UOp based on content (dtype + op).

This enables content-based hashing for cross-run caching. The hash traverses the DAG structure since Op contains Arc children that also get hashed. Cache fields are intentionally skipped - they don’t affect semantic identity.

Source§

fn hash<H>(&self, state: &mut H)
where H: Hasher,

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more

Auto Trait Implementations§

§

impl !Freeze for UOp

§

impl !RefUnwindSafe for UOp

§

impl Send for UOp

§

impl Sync for UOp

§

impl Unpin for UOp

§

impl UnsafeUnpin for UOp

§

impl !UnwindSafe for UOp

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more