Expand description
Unified parallel execution for heterogeneous devices.
The UnifiedExecutor handles kernel execution across any mix of devices
(CPU, CUDA, Metal, etc.) with proper synchronization and dependency tracking.
§Design Principles
- Single abstraction - One executor handles any device mix
- Device-agnostic sync - Timeline signals abstract over device-specific primitives
- Zero overhead for single-device - Fast path skips synchronization when possible
- Buffer dependency tracking - Following Tinygrad’s
_access_resources()pattern
§Example
ⓘ
let mut executor = UnifiedExecutor::new();
executor.add_device(DeviceSpec::Cpu)?;
// Execute schedule - handles dependencies automatically
let output_id = executor.execute(&schedule)?;§Execution Graph
For complex schedules with multiple devices, the executor builds an execution graph (DAG) where nodes are kernel operations and edges are buffer dependencies. Independent kernels on the same device can be batched, and kernels on different devices can run in parallel (with appropriate synchronization).
Structs§
- Device
Context - Per-device execution context.
- Execution
Graph - Execution graph representing a DAG of kernel operations.
- Execution
Node - A node in the execution graph representing a kernel or transfer operation.
- Kernel
Buffer Access - Buffer access information for parallel kernel execution.
- Unified
Executor - Unified executor for heterogeneous device execution.
Enums§
- Sync
Strategy - Cross-device synchronization strategy.
Functions§
- global_
executor - Get access to the global executor.