pub struct UOp {
pub id: u64,
pub content_hash: u64,
pub tag: Option<SmallVec<[usize; 2]>>,
/* private fields */
}Expand description
Micro-operation node in the computation graph.
UOps form a DAG where operations reference their inputs through the Op enum. Hash consing ensures that structurally identical UOps share the same allocation.
Shape inference is lazy and cached - computed on first access via shape() method.
Note: Debug uses derive_more with #[debug(skip)] on cache fields to prevent
stack overflow from recursive Arc
Fields§
§id: u64Unique stable ID for this UOp instance. Used for identity-based caching instead of fragile raw pointers.
content_hash: u64Structural content hash — deterministic regardless of allocation order. Computed at creation time: hash(op_discriminant, dtype, op_data, children_content_hashes). O(1) per node since children are already created with their content_hash set. Used for schedule-level caching where UOp IDs are not stable across runs.
tag: Option<SmallVec<[usize; 2]>>Tag for tracking tensor identity through the rangeify pipeline.
Matches Tinygrad’s UOp.tag (ops.py:128). Tags are tuples of integer indices
that track which original tensor UOps map to which final kernel outputs.
Tags participate in hash consing — different tag = different UOp.
Values:
None— untagged (default)Some([])— empty tag (e.g., RANGE ops)Some([i])— single index (assigned by add_tags)Some([i, j, ...])— merged indices (from buffer folding)
Implementations§
Source§impl UOp
impl UOp
pub fn try_add(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_sub(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_mul(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_mod(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Sourcepub fn try_div(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_div(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Division with automatic type-based operator selection.
Uses Idiv for integer types and Fdiv for float types.
Sourcepub fn try_max(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_max(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Maximum of two values: max(a, b).
Sourcepub fn neg(self: &Arc<UOp>) -> Arc<UOp>
pub fn neg(self: &Arc<UOp>) -> Arc<UOp>
Negation: -x.
Produces MUL(x, -1) instead of Unary(Neg, x), matching Tinygrad’s approach.
Unary(Neg) is reintroduced late in codegen decompositions (pm_neg_from_mul)
AFTER pm_lower_index_dtype has resolved all Invalid nodes. This ensures
propagate_invalid (which only handles Binary ops) can push WHERE+Invalid
through negation.
If self has a shape, broadcasts -1 to match (RESHAPE+EXPAND), matching
Tinygrad’s Tensor-level _broadcasted(). If shapeless (schedule/symbolic
context), uses a scalar const directly.
pub fn try_add_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>where
T: IntoUOp,
pub fn try_sub_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>where
T: IntoUOp,
pub fn try_mul_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>where
T: IntoUOp,
pub fn try_mod_scalar<T>(lhs: Arc<UOp>, rhs: T) -> Result<Arc<UOp>, Error>where
T: IntoUOp,
pub fn try_sqrt(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_rsqrt(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_exp(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_exp2(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_log(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_log2(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_sin(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cos(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_tan(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Sourcepub fn erf(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn erf(self: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Error function: erf(x) - requires float dtype.
Sourcepub fn try_reciprocal(operand: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_reciprocal(operand: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Reciprocal: 1/x - requires float dtype.
pub fn try_and_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_or_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_xor_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_shl_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_shr_op(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cmplt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cmple(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cmpeq(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cmpne(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cmpgt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
pub fn try_cmpge(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Result<Arc<UOp>, Error>
Sourcepub fn try_where(
condition: Arc<UOp>,
true_val: Arc<UOp>,
false_val: Arc<UOp>,
) -> Result<Arc<UOp>, Error>
pub fn try_where( condition: Arc<UOp>, true_val: Arc<UOp>, false_val: Arc<UOp>, ) -> Result<Arc<UOp>, Error>
Conditional selection: condition ? true_val : false_val.
§Errors
WhereConditionNotBoolif condition dtype is not bool
Sourcepub fn try_mulacc(
a: Arc<UOp>,
b: Arc<UOp>,
c: Arc<UOp>,
) -> Result<Arc<UOp>, Error>
pub fn try_mulacc( a: Arc<UOp>, b: Arc<UOp>, c: Arc<UOp>, ) -> Result<Arc<UOp>, Error>
Multiply-accumulate: a * b + c (fused operation).
All operands must have matching dtypes (including vcount) for valid codegen. Returns None if vcounts don’t match - caller should fall back to Add(Mul(a,b), c).
Sourcepub fn add(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn add(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_add.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn sub(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn sub(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_sub.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn mul(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn mul(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_mul.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn idiv(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn idiv(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_div.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn mod_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn mod_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_mod.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn max(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn max(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_max.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn and_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn and_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_and_op.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn or_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn or_(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_or_op.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn xor(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn xor(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_xor_op.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn shl(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn shl(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_shl_op.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn shr(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn shr(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_shr_op.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn lt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn lt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_cmplt.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn le(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn le(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_cmple.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn gt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn gt(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_cmpgt.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn ge(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn ge(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_cmpge.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn eq(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn eq(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_cmpeq.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn ne(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
pub fn ne(self: &Arc<UOp>, rhs: &Arc<UOp>) -> Arc<UOp>
Panicking version of try_cmpne.
For use in pattern rewrites where types are validated. Panics on type mismatch.
Sourcepub fn alu(op: BinaryOp, lhs: Arc<UOp>, rhs: Arc<UOp>) -> Arc<UOp>
pub fn alu(op: BinaryOp, lhs: Arc<UOp>, rhs: Arc<UOp>) -> Arc<UOp>
Low-level binary op constructor that auto-selects result dtype.
Comparisons produce Bool; everything else inherits lhs dtype.
Matches Tinygrad’s UOp.alu(). No type promotion or validation —
use only in rewrites where types are already correct.
Source§impl UOp
impl UOp
Sourcepub fn range_axis(
end: Arc<UOp>,
axis_id: AxisId,
axis_type: AxisType,
) -> Arc<UOp>
pub fn range_axis( end: Arc<UOp>, axis_id: AxisId, axis_type: AxisType, ) -> Arc<UOp>
Create a Range operation with specified axis type.
Sourcepub fn range(end: Arc<UOp>, axis_id: usize) -> Arc<UOp>
pub fn range(end: Arc<UOp>, axis_id: usize) -> Arc<UOp>
Create a RANGE operation with Loop axis type (convenience for tests).
Uses AxisId::Renumbered since tests typically work with renumbered kernels.
Sourcepub fn range_const(end_value: i64, axis_id: usize) -> Arc<UOp>
pub fn range_const(end_value: i64, axis_id: usize) -> Arc<UOp>
Create a RANGE operation with constant end value (convenience for tests).
Uses AxisId::Renumbered since tests typically work with renumbered kernels.
Creates a Loop range (inside kernels).
Sourcepub fn range_outer_const(end_value: i64, axis_id: usize) -> Arc<UOp>
pub fn range_outer_const(end_value: i64, axis_id: usize) -> Arc<UOp>
Create an OUTER RANGE operation with constant end value (convenience for tests).
Uses AxisId::Renumbered since tests typically work with renumbered kernels.
Creates an Outer range (wraps entire kernels).
Sourcepub fn if_(condition: Arc<UOp>, body: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
pub fn if_(condition: Arc<UOp>, body: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
Create a conditional block that executes body when condition is true.
Body contains operations to execute; use endif to close the block.
Sourcepub fn end(self: &Arc<UOp>, ranges: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
pub fn end(self: &Arc<UOp>, ranges: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
End of range or reduce scope.
Wraps self (the computation) and closes the specified ranges. This marks the end of RANGE or REDUCE loops.
§Arguments
ranges- The RANGE or REDUCE operations being closed
Sourcepub fn barrier(self: &Arc<UOp>, deps: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
pub fn barrier(self: &Arc<UOp>, deps: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
Insert a synchronization barrier.
Self passes through; deps are operations that must complete before
any consumer of this barrier executes.
Sourcepub fn var(
name: impl Into<String>,
dtype: DType,
min_val: i64,
max_val: i64,
) -> Arc<UOp>
pub fn var( name: impl Into<String>, dtype: DType, min_val: i64, max_val: i64, ) -> Arc<UOp>
Create a DefineVar operation for range-bounded variables.
Used in testing and symbolic analysis to define variables with known ranges. Range is [min_val, max_val] inclusive.
Sourcepub fn define_var(name: String, min_val: i64, max_val: i64) -> Arc<UOp>
pub fn define_var(name: String, min_val: i64, max_val: i64) -> Arc<UOp>
Define a symbolic variable with known bounds for range analysis.
Range is [min_val, max_val] inclusive.
Source§impl UOp
impl UOp
Sourcepub fn const_(dtype: DType, value: ConstValue) -> Arc<UOp>
pub fn const_(dtype: DType, value: ConstValue) -> Arc<UOp>
Create a constant UOp with explicit dtype and value.
Normalizes the value to match the target dtype (e.g., Float(5.0) becomes
Int(5) when dtype is Int32). This prevents codegen from emitting
mismatched literals.
Use native_const for type-inferred constants from Rust values.
Sourcepub fn native_const<T>(value: T) -> Arc<UOp>
pub fn native_const<T>(value: T) -> Arc<UOp>
Create a constant UOp from a Rust native value with automatic dtype inference.
Sourcepub fn index_const(value: i64) -> Arc<UOp>
pub fn index_const(value: i64) -> Arc<UOp>
Create an index constant.
Sourcepub fn const_like<T>(self: &Arc<UOp>, value: T) -> Arc<UOp>where
T: IntoUOp,
pub fn const_like<T>(self: &Arc<UOp>, value: T) -> Arc<UOp>where
T: IntoUOp,
Create a constant with the same dtype as self.
This is the Rust equivalent of Tinygrad’s x.const_like(value).
Useful for creating identity elements, zeros, or other constants
that match an existing UOp’s type.
§Examples
let x = UOp::const_(DType::Float32, morok_ir::ConstValue::Float(5.0));
let zero = x.const_like(0.0);
assert_eq!(zero.dtype(), DType::Float32);Sourcepub fn vconst(values: Vec<ConstValue>, scalar_dtype: DType) -> Arc<UOp>
pub fn vconst(values: Vec<ConstValue>, scalar_dtype: DType) -> Arc<UOp>
Create a vector constant from multiple values.
Dtype is inferred from the first value; all values must be same type.
Sourcepub fn new_buffer(device: DeviceSpec, size: usize, dtype: DType) -> Arc<UOp>
pub fn new_buffer(device: DeviceSpec, size: usize, dtype: DType) -> Arc<UOp>
Create a new buffer.
Equivalent to: UOp(Ops.BUFFER, dtype, (unique(), device(device_spec)), size)
Sourcepub fn param(
slot: usize,
size: usize,
dtype: DType,
device: Option<Arc<UOp>>,
) -> Arc<UOp>
pub fn param( slot: usize, size: usize, dtype: DType, device: Option<Arc<UOp>>, ) -> Arc<UOp>
Create a normalized buffer parameter with positional slot.
Used by pre-schedule normalization (BUFFER→PARAM) to erase buffer identity.
Matches Tinygrad’s UOp.param(slot, dtype, shape, device) (ops.py:817-819).
Sourcepub fn device(device: DeviceSpec) -> Arc<UOp>
pub fn device(device: DeviceSpec) -> Arc<UOp>
Create a device specification.
Source§impl UOp
impl UOp
Sourcepub fn sink(sources: Vec<Arc<UOp>>) -> Arc<UOp>
pub fn sink(sources: Vec<Arc<UOp>>) -> Arc<UOp>
Create a sink operation (graph termination).
Sink marks outputs that must be evaluated. All sources are dependencies.
Sourcepub fn group(sources: Vec<Arc<UOp>>) -> Arc<UOp>
pub fn group(sources: Vec<Arc<UOp>>) -> Arc<UOp>
Create a group operation (merging/organizing related ops).
Group is a NOOP that helps organize related operations together. It passes through the first source while ensuring all sources are dependencies.
Sourcepub fn assign(target: Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>
pub fn assign(target: Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>
In-place assignment.
§Arguments
target- The INDEX operation for the assignment destinationvalue- The value to assign
Sourcepub fn assign_with_mops(
target: Arc<UOp>,
value: Arc<UOp>,
movement_ops: Option<Arc<UOp>>,
) -> Arc<UOp>
pub fn assign_with_mops( target: Arc<UOp>, value: Arc<UOp>, movement_ops: Option<Arc<UOp>>, ) -> Arc<UOp>
In-place assignment with movement ops chain.
The movement_ops parameter captures shape transformations from the
original target, used during bufferize_to_store to apply the same
transformations to the result buffer.
Sourcepub fn contiguous(self: &Arc<UOp>) -> Arc<UOp>
pub fn contiguous(self: &Arc<UOp>) -> Arc<UOp>
Ensure contiguous memory layout.
Elides the CONTIGUOUS wrapper when the source is already contiguous:
- Already a CONTIGUOUS node (no double wrapping)
- Has buffer identity (BUFFER, or RESHAPE/MULTI chain to BUFFER)
Based on Tinygrad’s UOp.contiguous() (ops.py:463-466).
Sourcepub fn contiguous_with_opts(
self: &Arc<UOp>,
opts: SmallVec<[ContiguousHint; 4]>,
) -> Arc<UOp>
pub fn contiguous_with_opts( self: &Arc<UOp>, opts: SmallVec<[ContiguousHint; 4]>, ) -> Arc<UOp>
Ensure contiguous memory layout with optimization hints.
The hints are extracted during rangeify and passed to the optimizer. Based on Tinygrad’s CONTIGUOUS.arg which carries Opt tuples.
Sourcepub fn precast(self: &Arc<UOp>) -> Arc<UOp>
pub fn precast(self: &Arc<UOp>) -> Arc<UOp>
Optimizer hint to force materialization before type conversion.
Inserted before BITCAST to ensure the source is rendered separately in codegen (prevents invalid cast fusion).
Source§impl UOp
impl UOp
Sourcepub fn wmma(
a: Arc<UOp>,
b: Arc<UOp>,
c: Arc<UOp>,
metadata: WmmaMetadata,
) -> Arc<UOp>
pub fn wmma( a: Arc<UOp>, b: Arc<UOp>, c: Arc<UOp>, metadata: WmmaMetadata, ) -> Arc<UOp>
Warp Matrix Multiply-Accumulate for tensor cores.
Computes D = A × B + C using hardware matrix units.
metadata specifies dimensions, dtypes, and upcast axes for vectorization.
Sourcepub fn try_vectorize(
elements: SmallVec<[Arc<UOp>; 4]>,
) -> Result<Arc<UOp>, Error>
pub fn try_vectorize( elements: SmallVec<[Arc<UOp>; 4]>, ) -> Result<Arc<UOp>, Error>
Create vector from scalar elements (fallible version with validation).
§Errors
VectorizeRequiresMultipleif elements is emptyVectorizeDTypeMismatchif elements have different scalar dtypes
Sourcepub fn vectorize(elements: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
pub fn vectorize(elements: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
Create vector from scalar elements (panics on violation).
Sourcepub fn try_broadcast(self: &Arc<UOp>, count: usize) -> Result<Arc<UOp>, Error>
pub fn try_broadcast(self: &Arc<UOp>, count: usize) -> Result<Arc<UOp>, Error>
Broadcast a scalar value to a vector by replication (fallible version).
Creates a VECTORIZE operation with count copies of the source.
If count == 1, returns the source unchanged.
§Errors
BroadcastRequiresScalarif source has vcount > 1
Sourcepub fn try_gep(self: &Arc<UOp>, indices: Vec<usize>) -> Result<Arc<UOp>, Error>
pub fn try_gep(self: &Arc<UOp>, indices: Vec<usize>) -> Result<Arc<UOp>, Error>
Extract element(s) from vector (fallible version with validation).
§Errors
GepRequiresVectorif source has vcount <= 1GepIndexOutOfBoundsif any index >= source vcount
Sourcepub fn try_contract(
self: &Arc<UOp>,
upcast_ranges: Vec<(usize, usize)>,
) -> Result<Arc<UOp>, Error>
pub fn try_contract( self: &Arc<UOp>, upcast_ranges: Vec<(usize, usize)>, ) -> Result<Arc<UOp>, Error>
Contract unrolled values back into vectorized form (fallible version).
§Errors
ContractCountMismatchif dtype.vcount != product of axis sizes
Sourcepub fn contract(self: &Arc<UOp>, upcast_ranges: Vec<(usize, usize)>) -> Arc<UOp>
pub fn contract(self: &Arc<UOp>, upcast_ranges: Vec<(usize, usize)>) -> Arc<UOp>
Contract unrolled values back into vectorized form.
Pairs with UNROLL: UNROLL expands loops for optimization, CONTRACT combines the results. Used in WMMA and vectorization passes.
Sourcepub fn try_unroll(
self: &Arc<UOp>,
unroll_axes: Vec<(usize, usize)>,
) -> Result<Arc<UOp>, Error>
pub fn try_unroll( self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>, ) -> Result<Arc<UOp>, Error>
Expand a value across unrolled loop iterations (fallible version).
§Errors
UnrollCountMismatchif src.dtype.vcount != product of axis sizes
Sourcepub fn unroll(self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>) -> Arc<UOp>
pub fn unroll(self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>) -> Arc<UOp>
Expand a value across unrolled loop iterations.
Creates multiple versions of the computation for each unroll axis. Pairs with CONTRACT which combines results back together.
Sourcepub fn unroll_with_dtype(
self: &Arc<UOp>,
unroll_axes: Vec<(usize, usize)>,
dtype: DType,
) -> Arc<UOp>
pub fn unroll_with_dtype( self: &Arc<UOp>, unroll_axes: Vec<(usize, usize)>, dtype: DType, ) -> Arc<UOp>
Create UNROLL with explicit dtype (for do_contract pattern).
Used when UNROLL dtype should differ from source dtype, specifically when CONTRACT collapses UNROLL via GEP and we need to preserve the per-iteration element type.
Based on Tinygrad’s pattern where partial contraction creates UNROLL with remaining axes but CONTRACT’s dtype.
Sourcepub fn mstack(buffers: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
pub fn mstack(buffers: SmallVec<[Arc<UOp>; 4]>) -> Arc<UOp>
Stack multiple buffers (multi-device tensors).
MStack combines buffers from multiple devices into a single logical tensor. Used for distributed/multi-GPU tensor operations.
Sourcepub fn mselect(self: &Arc<UOp>, device_index: usize) -> Arc<UOp>
pub fn mselect(self: &Arc<UOp>, device_index: usize) -> Arc<UOp>
Select buffer by device index (multi-device access).
MSelect retrieves a specific device’s buffer from a multi-device tensor.
Sourcepub fn kernel(sources: SmallVec<[Arc<UOp>; 4]>, ast: Arc<UOp>) -> Arc<UOp>
pub fn kernel(sources: SmallVec<[Arc<UOp>; 4]>, ast: Arc<UOp>) -> Arc<UOp>
Kernel wrapper.
Creates a KERNEL operation with the given sources (kernel arguments) and AST (computation).
§Arguments
sources- Kernel arguments (buffers and variables)ast- The computation graph (usually SINK, COPY, or BUFFER_VIEW)
Sourcepub fn cat() -> UOpCatBuilder
pub fn cat() -> UOpCatBuilder
Sourcepub fn ptrcat() -> UOpPtrcatBuilder
pub fn ptrcat() -> UOpPtrcatBuilder
Source§impl UOp
impl UOp
Sourcepub fn pointer_index(
self: &Arc<UOp>,
offset: Arc<UOp>,
) -> Result<Arc<UOp>, Error>
pub fn pointer_index( self: &Arc<UOp>, offset: Arc<UOp>, ) -> Result<Arc<UOp>, Error>
Create a pointer index operation (pointer arithmetic).
Performs pointer + offset arithmetic for address calculation in kernels. Both self (ptr) and offset should have Index dtype.
Sourcepub fn slice(buffer: Arc<UOp>, specs: Vec<IndexSpec>) -> Result<Arc<UOp>, Error>
pub fn slice(buffer: Arc<UOp>, specs: Vec<IndexSpec>) -> Result<Arc<UOp>, Error>
Multi-dimensional slicing with IndexSpec.
Note: Range and NewAxis specs are not fully implemented; currently only Single indices are properly supported.
Sourcepub fn slice_gated(
buffer: Arc<UOp>,
specs: Vec<IndexSpec>,
gate: Arc<UOp>,
) -> Result<Arc<UOp>, Error>
pub fn slice_gated( buffer: Arc<UOp>, specs: Vec<IndexSpec>, gate: Arc<UOp>, ) -> Result<Arc<UOp>, Error>
Gated slicing - conditional access with gate.
Sourcepub fn store(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>
pub fn store(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>
Create a STORE operation without ranges.
Stores a value at self (INDEX location).
The buffer is accessed indirectly through the INDEX node.
For stores with ranges (e.g., output upcasting), use store_with_ranges.
For gated stores, use an INDEX with a gate (INDEX has optional gate field).
Sourcepub fn store_with_ranges(
self: &Arc<UOp>,
value: Arc<UOp>,
ranges: SmallVec<[Arc<UOp>; 4]>,
) -> Arc<UOp>
pub fn store_with_ranges( self: &Arc<UOp>, value: Arc<UOp>, ranges: SmallVec<[Arc<UOp>; 4]>, ) -> Arc<UOp>
Create a STORE operation with ranges.
Stores a value at self (INDEX location), with explicit ranges
that define the scope of the store operation. This matches Tinygrad’s
architecture where STORE sources are (index, value, *ranges).
Ranges are used for output upcasting: Range(Upcast) becomes UNROLL
during expansion, which fix_store_unroll contracts via CONTRACT.
For gated stores, use an INDEX with a gate (INDEX has optional gate field).
Sourcepub fn copy_to_device(self: &Arc<UOp>, device: DeviceSpec) -> Arc<UOp>
pub fn copy_to_device(self: &Arc<UOp>, device: DeviceSpec) -> Arc<UOp>
Copy to a different device.
Sourcepub fn copy(self: &Arc<UOp>, device: Arc<UOp>) -> Arc<UOp>
pub fn copy(self: &Arc<UOp>, device: Arc<UOp>) -> Arc<UOp>
Create a COPY operation with explicit device UOp.
Unlike copy_to_device which takes a DeviceSpec, this takes
a device UOp directly (useful when you already have one).
Sourcepub fn bufferize(
compute: Arc<UOp>,
ranges: Vec<Arc<UOp>>,
opts: BufferizeOpts,
) -> Arc<UOp>
pub fn bufferize( compute: Arc<UOp>, ranges: Vec<Arc<UOp>>, opts: BufferizeOpts, ) -> Arc<UOp>
Create a BUFFERIZE operation.
Marks a computation to be materialized into a buffer. The computation is evaluated over the given ranges and stored.
Sourcepub fn bufferize_global(compute: Arc<UOp>, ranges: Vec<Arc<UOp>>) -> Arc<UOp>
pub fn bufferize_global(compute: Arc<UOp>, ranges: Vec<Arc<UOp>>) -> Arc<UOp>
Create a BUFFERIZE operation with Global address space.
This is the most common pattern - bufferize to global memory.
Sourcepub fn bufferize_local(compute: Arc<UOp>, ranges: Vec<Arc<UOp>>) -> Arc<UOp>
pub fn bufferize_local(compute: Arc<UOp>, ranges: Vec<Arc<UOp>>) -> Arc<UOp>
Create a BUFFERIZE operation with Local address space.
For shared/local memory bufferization.
Sourcepub fn define_local(id: usize, dtype: DType) -> Arc<UOp>
pub fn define_local(id: usize, dtype: DType) -> Arc<UOp>
Create a DEFINE_LOCAL operation.
Defines a local (shared) memory allocation with the given ID.
Sourcepub fn define_reg(size: usize) -> Arc<UOp>
pub fn define_reg(size: usize) -> Arc<UOp>
Define register memory (void pointer - type determined by usage).
Sourcepub fn define_reg_typed(size: usize, element_dtype: DType) -> Arc<UOp>
pub fn define_reg_typed(size: usize, element_dtype: DType) -> Arc<UOp>
Define register memory with explicit element type.
Creates a typed register accumulator for use in reductions. The element_dtype specifies the type of each element (e.g., Float32 for a float accumulator).
Sourcepub fn index<I>() -> UOpIndexBuilder<I>
pub fn index<I>() -> UOpIndexBuilder<I>
Create a buffer index operation for multi-dimensional access.
All indices must have Index dtype.
§Dtype behavior (matches Tinygrad’s buf.index(idx, ptr=False, dtype=None))
- If
dtypeis provided: use it directly (explicit dtype takes precedence) - If
ptris true: keep the buffer’s Ptr dtype (for STORE targets) - Otherwise (ptr=false, default): extract element type from buffer (for LOAD sources)
§Examples
// Element dtype (default) - for LOAD
UOp::index().buffer(buf).indices(vec![idx]).call()?
// Ptr dtype via ptr=true - for STORE (preferred, Tinygrad-aligned)
UOp::index().buffer(buf).indices(vec![idx]).ptr(true).call()?
// Explicit Ptr dtype - for STORE (legacy, works but prefer .ptr(true))
let ptr_dtype = DType::Float32.ptr(Some(size), AddrSpace::Global);
UOp::index().buffer(buf).indices(vec![idx]).dtype(ptr_dtype).call()?
// With gate
UOp::index().buffer(buf).indices(vec![idx]).gate(gate_uop).call()?Sourcepub fn load() -> UOpLoadBuilder
pub fn load() -> UOpLoadBuilder
Source§impl UOp
impl UOp
Sourcepub fn try_reduce_axis(
self: &Arc<UOp>,
reduce_op: ReduceOp,
axes: Vec<usize>,
) -> Result<Arc<UOp>, Error>
pub fn try_reduce_axis( self: &Arc<UOp>, reduce_op: ReduceOp, axes: Vec<usize>, ) -> Result<Arc<UOp>, Error>
Reduce along specified axes using reduce_op.
Implements Tinygrad’s early-return pattern: when all axes are reduced or when all reduction axes have dimension 1, returns self instead of creating a ReduceAxis operation.
§Errors
Returns error if any axis is >= number of dimensions.
Source§impl UOp
impl UOp
Sourcepub fn try_reshape(
self: &Arc<UOp>,
new_shape: &SmallVec<[SInt; 4]>,
) -> Result<Arc<UOp>, Error>
pub fn try_reshape( self: &Arc<UOp>, new_shape: &SmallVec<[SInt; 4]>, ) -> Result<Arc<UOp>, Error>
Reshape with strict validation (fail-fast).
Validates:
- No negative dimensions in new_shape
- Product of input shape == product of output shape
Sourcepub fn try_expand(
self: &Arc<UOp>,
new_shape: &SmallVec<[SInt; 4]>,
) -> Result<Arc<UOp>, Error>
pub fn try_expand( self: &Arc<UOp>, new_shape: &SmallVec<[SInt; 4]>, ) -> Result<Arc<UOp>, Error>
Expand (broadcast) with strict validation.
Validates:
- Number of dimensions matches
- Each dimension either matches or src dimension is 1
Sourcepub fn try_permute(self: &Arc<UOp>, axes: Vec<usize>) -> Result<Arc<UOp>, Error>
pub fn try_permute(self: &Arc<UOp>, axes: Vec<usize>) -> Result<Arc<UOp>, Error>
Permute with strict validation.
Validates:
- Permutation is valid (contains each index 0..n exactly once)
Sourcepub fn try_pad(
self: &Arc<UOp>,
padding: &[(SInt, SInt)],
) -> Result<Arc<UOp>, Error>
pub fn try_pad( self: &Arc<UOp>, padding: &[(SInt, SInt)], ) -> Result<Arc<UOp>, Error>
Pad with strict validation.
Validates:
- Padding values are concrete (not symbolic)
- Number of padding pairs matches dimensions
Sourcepub fn try_shrink(
self: &Arc<UOp>,
ranges: &[(SInt, SInt)],
) -> Result<Arc<UOp>, Error>
pub fn try_shrink( self: &Arc<UOp>, ranges: &[(SInt, SInt)], ) -> Result<Arc<UOp>, Error>
Shrink (slice) with strict validation.
Validates:
- Range values are concrete (not symbolic)
- begin <= end for each dimension
- 0 <= begin, end <= dimension_size
Source§impl UOp
impl UOp
Sourcepub fn rtag(self: &Arc<UOp>, tag: Option<SmallVec<[usize; 2]>>) -> Arc<UOp>
pub fn rtag(self: &Arc<UOp>, tag: Option<SmallVec<[usize; 2]>>) -> Arc<UOp>
Create a new UOp with the given tag (Tinygrad: rtag()).
Returns self unchanged if tag is already equal.
Sourcepub fn with_tag(self: &Arc<UOp>, tag: SmallVec<[usize; 2]>) -> Arc<UOp>
pub fn with_tag(self: &Arc<UOp>, tag: SmallVec<[usize; 2]>) -> Arc<UOp>
Create a new UOp with the given tag set.
Sourcepub fn has_buffer_identity(&self) -> bool
pub fn has_buffer_identity(&self) -> bool
Check if this UOp has a concrete buffer identity in the graph.
Returns true for BUFFER or RESHAPE/MULTI chains leading to BUFFER. These are already contiguous by definition, so wrapping in CONTIGUOUS is a no-op.
Based on Tinygrad’s UOp.has_buffer_identity() (ops.py:616-619).
Sourcepub fn ptrdtype(&self) -> Option<(&DType, AddrSpace, Option<usize>)>
pub fn ptrdtype(&self) -> Option<(&DType, AddrSpace, Option<usize>)>
Get pointer dtype components if this UOp has a Ptr dtype.
Returns (base, addrspace, size) for Ptr types, None otherwise.
This simplifies pattern matching on pointer types.
§Examples
let buffer = UOp::new_buffer(DeviceSpec::Cpu, 10, DType::Float32);
if let Some((base, addrspace, size)) = buffer.ptrdtype() {
assert_eq!(*base, DType::Float32);
assert_eq!(addrspace, AddrSpace::Global);
}Sourcepub fn with_dtype(self: &Arc<UOp>, dtype: DType) -> Arc<UOp>
pub fn with_dtype(self: &Arc<UOp>, dtype: DType) -> Arc<UOp>
Create a copy of this UOp with a different dtype.
If the dtype is unchanged, returns self (clone of Arc).
This is the Rust equivalent of Tinygrad’s buf.replace(dtype=x).
§Examples
let int_const = UOp::const_(DType::Int32, morok_ir::ConstValue::Int(5));
let float_const = int_const.with_dtype(DType::Float32);
assert_eq!(float_const.dtype(), DType::Float32);Sourcepub fn store_buffer(&self) -> Option<&Arc<UOp>>
pub fn store_buffer(&self) -> Option<&Arc<UOp>>
Get the buffer from a STORE operation (via its INDEX child).
STORE operations reference the buffer indirectly through an INDEX node.
This helper extracts the buffer from STORE.index.buffer.
Returns None if:
- This is not a STORE operation
- The STORE’s index is not an INDEX operation
Sourcepub fn load_buffer(&self) -> Option<Arc<UOp>>
pub fn load_buffer(&self) -> Option<Arc<UOp>>
Get the buffer from a LOAD operation.
Returns None if this is not a LOAD operation.
Sourcepub fn store_value(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>
pub fn store_value(self: &Arc<UOp>, value: Arc<UOp>) -> Arc<UOp>
Store a value at this INDEX node.
Convenience method for self.store(value).
Matches Tinygrad’s idx.store(val) pattern.
§Panics
Debug-asserts that self is an INDEX operation.
Sourcepub fn with_src(self: &Arc<UOp>, new_srcs: Vec<Arc<UOp>>) -> Arc<UOp>
pub fn with_src(self: &Arc<UOp>, new_srcs: Vec<Arc<UOp>>) -> Arc<UOp>
Alias for with_sources().
Creates a new UOp with the same operation type and dtype, but with the provided sources replacing the original ones.
Sourcepub fn shape(self: &Arc<UOp>) -> Result<Option<&SmallVec<[SInt; 4]>>, Error>
pub fn shape(self: &Arc<UOp>) -> Result<Option<&SmallVec<[SInt; 4]>>, Error>
Get the shape of this UOp.
Shape is computed lazily on first access and cached. Returns Ok(None) if shape cannot be determined (e.g., for control flow ops). Returns Err if there is a shape mismatch error.
§Examples
let scalar = UOp::const_(DType::Float32, ConstValue::Float(1.0));
assert_eq!(scalar.shape().unwrap().as_ref().map(|s| s.len()), Some(0)); // Scalar has empty shapeSourcepub fn vmin(self: &Arc<UOp>) -> &ConstValue
pub fn vmin(self: &Arc<UOp>) -> &ConstValue
Get the minimum possible value of this UOp.
Returns the minimum value based on range analysis. Computed lazily on first access and cached.
§Examples
let five = UOp::const_(DType::Int32, ConstValue::Int(5));
assert_eq!(five.vmin(), &ConstValue::Int(5));Sourcepub fn vmax(self: &Arc<UOp>) -> &ConstValue
pub fn vmax(self: &Arc<UOp>) -> &ConstValue
Get the maximum possible value of this UOp.
Returns the maximum value based on range analysis. Computed lazily on first access and cached.
§Examples
let five = UOp::const_(DType::Int32, ConstValue::Int(5));
assert_eq!(five.vmax(), &ConstValue::Int(5));Sourcepub fn device_spec(&self) -> Option<DeviceSpec>
pub fn device_spec(&self) -> Option<DeviceSpec>
Extract device specification from this UOp graph.
Traverses the graph to find Op::Device nodes following Tinygrad’s
_device recursive property (ops.py:585-599):
- DEVICE: returns the DeviceSpec directly
- BUFFER: returns device from the device child
- COPY: returns device from the device child (target device)
- Otherwise: searches children recursively
§Examples
let buffer = UOp::new_buffer(DeviceSpec::Cpu, 10, DType::Float32);
assert_eq!(buffer.device_spec(), Some(DeviceSpec::Cpu));Sourcepub fn base(self: &Arc<UOp>) -> Arc<UOp>
pub fn base(self: &Arc<UOp>) -> Arc<UOp>
Get the base UOp by walking through movement operations.
Movement operations (RESHAPE, PERMUTE, EXPAND, etc.) are views that don’t change the underlying data. This method recursively walks through these operations to find the actual buffer or computation that owns the data.
Based on Tinygrad’s base property (ops.py:524-527).
§Examples
let buffer = UOp::new_buffer(DeviceSpec::Cpu, 10, DType::Float32);
let shape = Shape::from_iter([SInt::Const(2), SInt::Const(5)]);
let reshaped = buffer.try_reshape(&shape).unwrap();
// base() walks through RESHAPE to get the original BUFFER
assert!(std::sync::Arc::ptr_eq(&reshaped.base(), &buffer));Sourcepub fn buf_uop(self: &Arc<UOp>) -> Arc<UOp>
pub fn buf_uop(self: &Arc<UOp>) -> Arc<UOp>
Get the underlying buffer UOp, walking through AFTER/MSELECT/MSTACK chains.
Based on Tinygrad’s buf_uop property (ops.py:601-606).
This recursively unwraps AFTER chains to find the actual buffer.
§Examples
use morok_ir::UOp;
// AFTER wrapping a buffer
let buffer = UOp::new_buffer(...);
let after = buffer.after(deps);
// buf_uop() walks through AFTER to get the underlying buffer
assert!(Arc::ptr_eq(&after.buf_uop(), &buffer));Sourcepub fn toposort(self: &Arc<UOp>) -> Vec<Arc<UOp>>
pub fn toposort(self: &Arc<UOp>) -> Vec<Arc<UOp>>
Topological sort of the computation graph.
Returns nodes in an order where all dependencies come before their dependents.
Sourcepub fn toposort_filtered<F>(self: &Arc<UOp>, gate: F) -> Vec<Arc<UOp>>
pub fn toposort_filtered<F>(self: &Arc<UOp>, gate: F) -> Vec<Arc<UOp>>
Topological sort with gate function (filtered toposort).
Only traverses nodes for which gate(node) returns true.
Nodes for which gate returns false are excluded from the
traversal entirely (along with their ancestors).
This is a key optimization for cached property computation, allowing us to skip nodes that already have a property cached.
§Performance
For a graph with 10,000 nodes where 9,900 already have a cached property:
- Full toposort: 10,000 nodes visited
- Filtered toposort: 100 nodes visited
- Speedup: 100x
§Example
// Only process nodes that don't have shape cached
let uncached = uop.toposort_filtered(|node| {
node.shape_cache.get().is_none()
});Sourcepub fn any_in_subtree<F>(self: &Arc<UOp>, pred: F) -> bool
pub fn any_in_subtree<F>(self: &Arc<UOp>, pred: F) -> bool
Check if any node in the backward slice satisfies a predicate.
Early-exit DFS — returns true as soon as a matching node is found,
without building the full toposort Vec. Use this instead of
toposort().iter().any(pred) when you only need an existential check.
Sourcepub fn collect_in_subtree<F>(self: &Arc<UOp>, pred: F) -> Vec<Arc<UOp>>
pub fn collect_in_subtree<F>(self: &Arc<UOp>, pred: F) -> Vec<Arc<UOp>>
Collect all nodes in the backward slice that match a predicate.
DFS collecting matches — cheaper than toposort().iter().filter(pred).collect()
when you don’t need topological ordering.
Sourcepub fn node_count(self: &Arc<UOp>) -> usize
pub fn node_count(self: &Arc<UOp>) -> usize
Count unique nodes in the DAG rooted at this UOp.
Much cheaper than toposort().len() — no result Vec, no ordering.
Uses pointer-based visited set for O(1) identity checks.
Sourcepub fn has_index_in_sources(self: &Arc<UOp>) -> bool
pub fn has_index_in_sources(self: &Arc<UOp>) -> bool
O(1) cached check: does this node or any of its sources contain an INDEX op?
Computed lazily and cached. Each node checks itself and its direct sources’ cached values, so the total cost across the graph is O(N).
Sourcepub fn tree(self: &Arc<UOp>) -> String
pub fn tree(self: &Arc<UOp>) -> String
Render this UOp and its sources as a compact ASCII tree.
Shared nodes (appearing multiple times due to hash-consing) are shown
as back-references: [id] → (see above)
§Example Output
[42] STORE : Void
├── [10] PARAM(0) : Ptr<Float32> shape=[4]
├── [35] INDEX : Ptr<Float32> shape=[4]
│ ├── [10] → (see above)
│ └── [30] RANGE(0, Reduce) : Index
│ └── [5] CONST(Int(4)) : Index
└── [40] REDUCE(Add) : Float32 shape=[]
└── [35] → (see above)Sourcepub fn tree_full(self: &Arc<UOp>) -> String
pub fn tree_full(self: &Arc<UOp>) -> String
Render this UOp and its sources as a full ASCII tree.
Shared nodes are expanded every time they appear (verbose but complete). Use this when you need to see the full subtree at every occurrence.
Sourcepub fn ranges(self: &Arc<UOp>) -> &Vec<Arc<UOp>>
pub fn ranges(self: &Arc<UOp>) -> &Vec<Arc<UOp>>
Get all RANGE operations in this UOp’s computation graph.
Lazily computed and cached. Useful for rangeify pass to track loop variables.
Sourcepub fn in_scope_ranges(self: &Arc<UOp>) -> &HashSet<UOpKey>
pub fn in_scope_ranges(self: &Arc<UOp>) -> &HashSet<UOpKey>
Get the RANGE operations that are in scope at this UOp.
Returns only the ranges that are currently “active” (not yet ended). This is computed by:
- Merging ranges from all source operations
- Removing ranges that are ended by this operation
- Adding self if this is a RANGE operation
Based on Tinygrad’s ranges property (ops.py:318-320) and
_ranges recursive property (ops.py:302-315).
§Returns
A HashSet of RANGE UOps that are in scope at this point in the graph. The result is cached for performance.
§Examples
use morok_ir::{UOp, AxisType};
// A simple computation inside a range
let range = UOp::range(end, 0, AxisType::Loop);
let value = UOp::const_(...);
let end_op = value.end(vec![range.clone()]);
// Value has range in scope
assert!(value.in_scope_ranges().contains(&range));
// After END, range is no longer in scope
assert!(!end_op.in_scope_ranges().contains(&range));Sourcepub fn all_in_scope_ranges_are(self: &Arc<UOp>, axis_type: AxisType) -> bool
pub fn all_in_scope_ranges_are(self: &Arc<UOp>, axis_type: AxisType) -> bool
Check if all in-scope ranges at this UOp have the given AxisType.
Returns true if the in-scope ranges set is empty or all ranges match the specified axis type.
§Use Cases
all_in_scope_ranges_are(AxisType::Outer)- Used in split_store to determine if we’re at a kernel boundary
§Examples
use morok_ir::{UOp, AxisType};
// At kernel boundary: only OUTER ranges in scope
assert!(uop.all_in_scope_ranges_are(AxisType::Outer));
// Inside kernel: has non-OUTER ranges
assert!(!uop.all_in_scope_ranges_are(AxisType::Outer));Sourcepub fn has_non_outer_ranges(self: &Arc<UOp>) -> bool
pub fn has_non_outer_ranges(self: &Arc<UOp>) -> bool
Check if any in-scope range is NOT of the given AxisType.
Inverse of all_in_scope_ranges_are. Useful for Tinygrad-style
filtering: “skip if any range is not OUTER”.
§Examples
use morok_ir::{UOp, AxisType};
// Has non-OUTER ranges: should skip in split_store
if uop.has_non_outer_ranges() {
return None; // Don't split here
}Sourcepub fn get_consumer_map(self: &Arc<UOp>) -> HashMap<UOpKey, Vec<Arc<UOp>>>
pub fn get_consumer_map(self: &Arc<UOp>) -> HashMap<UOpKey, Vec<Arc<UOp>>>
Build a consumer map for this UOp’s computation graph.
Returns a HashMap where each UOp maps to the list of UOps that consume it. Useful for reverse traversal and dependency analysis.
Sourcepub fn reverse_toposort(
self: &Arc<UOp>,
consumer_map: &HashMap<UOpKey, Vec<Arc<UOp>>>,
) -> Vec<Arc<UOp>>
pub fn reverse_toposort( self: &Arc<UOp>, consumer_map: &HashMap<UOpKey, Vec<Arc<UOp>>>, ) -> Vec<Arc<UOp>>
Reverse topological sort of the computation graph.
Returns nodes in bottom-up order (leaves first, root last). Requires a consumer map to traverse from leaves to roots.
Sourcepub fn substitute(self: &Arc<UOp>, map: &HashMap<UOpKey, Arc<UOp>>) -> Arc<UOp>
pub fn substitute(self: &Arc<UOp>, map: &HashMap<UOpKey, Arc<UOp>>) -> Arc<UOp>
Replace UOps in the computation graph according to a substitution map.
Delegates to graph_rewrite_bottom_up with a wildcard pattern that looks up
each node in the map — exactly like Tinygrad’s substitute. The rewrite engine
provides O(n) memoization via its result cache.
Sourcepub fn substitute_gated(
self: &Arc<UOp>,
map: &HashMap<UOpKey, Arc<UOp>>,
) -> Arc<UOp>
pub fn substitute_gated( self: &Arc<UOp>, map: &HashMap<UOpKey, Arc<UOp>>, ) -> Arc<UOp>
Replace UOps with range-gated substitution (Tinygrad: extra_pm=pm_gate_substitute).
Like substitute, but skips subtrees whose in_scope_ranges() don’t contain
any of the substitution keys. This prevents substituting ranges in subexpressions
that don’t reference them, matching Tinygrad’s gate_substitute behavior.
Sourcepub fn with_sources(self: &Arc<UOp>, new_srcs: Vec<Arc<UOp>>) -> Arc<UOp>
pub fn with_sources(self: &Arc<UOp>, new_srcs: Vec<Arc<UOp>>) -> Arc<UOp>
Reconstruct this UOp with new sources.
Creates a new UOp with the same operation type and dtype, but with the provided sources replacing the original ones. Hash consing ensures that if an identical UOp already exists, it will be reused.
This is used by the graph rewrite engine when sources have been rewritten.
§Panics
Panics if the number of sources doesn’t match the operation’s arity.
§Examples
// Original: a + b
let add = UOp::add(a.clone(), b.clone());
// Rewrite sources: a' + b'
let new_add = add.with_sources(vec![a_prime, b_prime]);Source§impl UOp
impl UOp
Sourcepub fn replace<'f1>(self: &'f1 Arc<UOp>) -> UOpReplaceBuilder<'f1>
pub fn replace<'f1>(self: &'f1 Arc<UOp>) -> UOpReplaceBuilder<'f1>
Create a modified copy with optional field overrides.
Enables concise pattern implementations by allowing selective field modification.
Returns self.clone() if nothing changed (optimization for hash consing).
§Examples
let new_load = load.replace().dtype(new_dtype).src(new_sources).call();
let dtype_only = load.replace().dtype(new_dtype).call();Source§impl UOp
impl UOp
Sourcepub fn new(op: Op, dtype: DType) -> Arc<UOp>
pub fn new(op: Op, dtype: DType) -> Arc<UOp>
Create a new UOp with hash consing.
If an identical UOp already exists (in any thread) and is still alive, returns a reference to it. Otherwise, creates a new UOp and caches it.
§Thread Safety
This function is thread-safe. Creating the same UOp from different threads
will return the same Arc<UOp>, so Arc::ptr_eq works across threads.
§Memory Management
The cache stores weak references. UOps are automatically cleaned up when no strong references remain (Tinygrad-aligned behavior).
Sourcepub fn new_tagged(
op: Op,
dtype: DType,
tag: Option<SmallVec<[usize; 2]>>,
) -> Arc<UOp>
pub fn new_tagged( op: Op, dtype: DType, tag: Option<SmallVec<[usize; 2]>>, ) -> Arc<UOp>
Create a UOp with an explicit tag (Tinygrad: UOp(op, dtype, src, arg, tag)).
Tag participates in hash consing — same structure + different tag = different UOp.
Sourcepub fn with_metadata<T>(self: &Arc<UOp>, metadata: T) -> Arc<UOp>
pub fn with_metadata<T>(self: &Arc<UOp>, metadata: T) -> Arc<UOp>
Attach metadata to this UOp, creating a new instance.
Metadata is NOT part of hash consing - this method creates a new UOp with a different ID but the same operation structure. This allows attaching metadata (like kernel info) after optimization.
§Examples
let ast = /* ... optimized AST ... */;
let with_info = ast.with_metadata(KernelInfo::new("r_g16l16", vec![], false));Source§impl UOp
impl UOp
Sourcepub fn const_factor(&self) -> i64
pub fn const_factor(&self) -> i64
Returns the largest known integer that divides this UOp.
Based on Tinygrad’s const_factor() (ops.py:693-700).
For MUL, only checks immediate CONST children (not recursive).
Sourcepub fn divides(self: &Arc<UOp>, v: &Arc<UOp>) -> Option<Arc<UOp>>
pub fn divides(self: &Arc<UOp>, v: &Arc<UOp>) -> Option<Arc<UOp>>
Returns self / v if v divides self exactly, otherwise None.
Based on Tinygrad’s divides() (ops.py lines 703-711).
Delegates to [divides_int] for constant divisors.
Sourcepub fn divides_int(self: &Arc<UOp>, v: i64) -> Option<Arc<UOp>>
pub fn divides_int(self: &Arc<UOp>, v: i64) -> Option<Arc<UOp>>
Returns self / v if integer v divides all terms exactly, otherwise None.
Based on Tinygrad’s divides(v: int) (ops.py:701-709).
Recursively handles Const, Add, and Mul operations.
Sourcepub fn divide_exact(self: &Arc<UOp>, v: &Arc<UOp>) -> Option<Arc<UOp>>
pub fn divide_exact(self: &Arc<UOp>, v: &Arc<UOp>) -> Option<Arc<UOp>>
Returns self / v if exact division by UOp v is possible.
Based on Tinygrad’s divide_exact(v: UOp) (ops.py:717-726).
Handles identity, constant divisors, Add recursion, and Mul factoring.
Sourcepub fn symbolic_gcd(uops: &[Arc<UOp>]) -> Arc<UOp>
pub fn symbolic_gcd(uops: &[Arc<UOp>]) -> Arc<UOp>
Computes the symbolic GCD of multiple UOps, returning a UOp.
Based on Tinygrad’s UOp.gcd() (ops.py:713-716).
Finds both numeric GCD of const_factors AND common symbolic MUL factors.
For inputs 6*a*b and 4*a*c, returns 2*a (numeric GCD=2, common factor=a).
Sourcepub fn backward_slice_ids(self: &Arc<UOp>) -> &HashSet<u64>
pub fn backward_slice_ids(self: &Arc<UOp>) -> &HashSet<u64>
Cached backward slice: set of all node IDs reachable from this UOp.
O(1) membership test via contains(). Computed once and cached per-node.
Prefer this over backward_slice() when you only need to check if a
node is in the dependency set.
Sourcepub fn backward_slice(self: &Arc<UOp>) -> Vec<Arc<UOp>>
pub fn backward_slice(self: &Arc<UOp>) -> Vec<Arc<UOp>>
Returns all nodes that this UOp depends on (backward slice / dependency set).
For membership tests, prefer [backward_slice_ids()] which returns a
cached HashSet<u64> with O(1) lookup.
Sourcepub fn divisible_by(self: &Arc<UOp>, amount: usize) -> Option<usize>
pub fn divisible_by(self: &Arc<UOp>, amount: usize) -> Option<usize>
Check if this UOp’s size is divisible by the given amount.
Returns Some(quotient) if divisible, None otherwise.
This is a convenience method for the optimizer to validate transformations.
§Examples
let range = UOp::range(SInt::Const(16), 0, AxisType::Loop);
assert_eq!(range.divisible_by(4), Some(4)); // 16 / 4 = 4
assert_eq!(range.divisible_by(5), None); // 16 not divisible by 5Sourcepub fn with_axis_type(self: &Arc<UOp>, new_type: AxisType) -> Arc<UOp>
pub fn with_axis_type(self: &Arc<UOp>, new_type: AxisType) -> Arc<UOp>
Create a new RANGE UOp with a different axis type.
This is a convenience method for the optimizer to convert ranges between axis types (e.g., LOOP → GLOBAL for parallelization).
§Panics
Panics if called on a non-RANGE operation.
§Examples
let loop_range = UOp::range_axis(UOp::index_const(16), 0, AxisType::Loop);
let global_range = loop_range.with_axis_type(AxisType::Global);
// global_range has same size and axis_id, but different axis typeSourcepub fn get_idx(self: &Arc<UOp>) -> Arc<UOp>
pub fn get_idx(self: &Arc<UOp>) -> Arc<UOp>
Extract the actual index from a range, stripping validity checks.
If the range is a WHERE(valid, idx, invalid_marker), returns idx. Otherwise, returns the range itself.
This is used for range merging when comparing indexing patterns across multiple consumers.
Based on Tinygrad’s get_idx() (ops.py:438-439).
§Examples
// Range with padding: WHERE(i < 5, i, SENTINEL)
let padded_range = UOp::where_op(valid, idx.clone(), invalid_marker)?;
assert!(Arc::ptr_eq(&padded_range.get_idx(), &idx));
// Plain range: returns itself
let plain_range = UOp::range_axis(...);
assert!(Arc::ptr_eq(&plain_range.get_idx(), &plain_range));Sourcepub fn get_valid(self: &Arc<UOp>) -> Arc<UOp>
pub fn get_valid(self: &Arc<UOp>) -> Arc<UOp>
Extract the validity mask from a range.
If the range is a WHERE(valid, idx, invalid_marker), returns valid. Otherwise, returns constant true (always valid).
This is used for range merging to combine validity conditions when multiple consumers share compatible indexing patterns.
Based on Tinygrad’s get_valid() (ops.py:440-441).
§Examples
// Range with padding: WHERE(i < 5, i, SENTINEL)
let padded_range = UOp::where_op(valid.clone(), idx, invalid_marker)?;
assert!(Arc::ptr_eq(&padded_range.get_valid(), &valid));
// Plain range: returns constant true
let plain_range = UOp::range_axis(...);
if let Op::Const(cv) = plain_range.get_valid().op() {
assert_eq!(cv.0, ConstValue::Bool(true));
}Sourcepub fn is_invalid_marker(uop: &Arc<UOp>) -> bool
pub fn is_invalid_marker(uop: &Arc<UOp>) -> bool
Check if a UOp represents an invalid index marker.
Matches both scalar Op::Invalid and vectorized VECTORIZE(Invalid, ..., Invalid)
where ALL elements are Invalid. The vectorized form appears after expansion
broadcasts scalar Invalid across lanes.
Uses all() semantics (entire vector must be Invalid). This differs from
has_invalid() in symbolic patterns which uses any() for guard semantics.
Sourcepub fn invalid_marker() -> Arc<UOp>
pub fn invalid_marker() -> Arc<UOp>
Create an invalid index marker.
Invalid markers are used with WHERE operations to indicate out-of-bounds or padded regions. The value is undefined and should never be used directly - it exists only to be masked away by validity checks.
§Returns
A UOp representing an invalid index value.
§Examples
// Padding: WHERE(i < actual_size, i, invalid)
let invalid = UOp::invalid_marker();
let padded = UOp::where_op(valid, actual_idx, invalid)?;Sourcepub fn is_increasing(self: &Arc<UOp>) -> bool
pub fn is_increasing(self: &Arc<UOp>) -> bool
Check if this UOp is a monotonically increasing function of its inputs.
Returns true for:
- Irreducible ops (RANGE, CONST, DEFINE_VAR)
- ADD of increasing ops
- MUL/IDIV by non-negative constants
Based on Tinygrad’s is_increasing() (ops.py:689-694).
§Examples
// Constants are increasing
let c = UOp::const_(DType::Int32, ConstValue::Int(5));
assert!(c.is_increasing());
// Range variables are increasing
let range = UOp::range_axis(UOp::index_const(16), 0, AxisType::Loop);
assert!(range.is_increasing());
// x + y is increasing if both x and y are increasing
let sum = range.try_add(&c).unwrap();
assert!(sum.is_increasing());
// x * 2 is increasing if x is increasing
let two = UOp::const_(DType::Index, ConstValue::Int(2));
let scaled = range.try_mul(&two).unwrap();
assert!(scaled.is_increasing());Trait Implementations§
Source§impl Hash for UOp
Hash implementation for UOp based on content (dtype + op).
impl Hash for UOp
Hash implementation for UOp based on content (dtype + op).
This enables content-based hashing for cross-run caching. The hash traverses
the DAG structure since Op contains Arc
Auto Trait Implementations§
impl !Freeze for UOp
impl !RefUnwindSafe for UOp
impl Send for UOp
impl Sync for UOp
impl Unpin for UOp
impl UnsafeUnpin for UOp
impl !UnwindSafe for UOp
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more