//! [`PipelineStage`] trait.
use Arc;
use ;
use crateGpuError;
/// One stage in a multi-stream GPU pipeline.
///
/// Implementations enqueue their kernel onto `stream` synchronously
/// (no host wait) and return a `CudaEvent` marking the completion of
/// that stage's GPU work, plus the typed output. The executor
/// arranges that the next stage's `wait_for` is the previous stage's
/// returned event, so cross-stage synchronization is on-device only.