Crate hydro2_async_scheduler

Source

Macros§

task_item
try_build_async_scheduler_config: A macro to build an AsyncSchedulerConfig without using .unwrap() or .expect(). It returns a Result<AsyncSchedulerConfig, NetworkError> so you can use ?.

Structs§

AsyncScheduler
AsyncSchedulerBuilder: Builder for AsyncScheduler.
AsyncSchedulerConfig: Configuration for our upgraded asynchronous scheduler.
AsyncSchedulerConfigBuilder: Builder for AsyncSchedulerConfig.
InFlightCounter: Tracks the in-flight tasks: how many nodes are actively being processed by workers.
MockCheckpointCallback: A mock checkpoint callback that records each checkpoint invocation.
MockCheckpointCallbackBuilder: Builder for MockCheckpointCallback.
NoOpCheckpointCallback
SharedCompletedNodes: A concurrency‐safe set of completed node indices, using tokio::sync::AsyncMutex<HashSet<usize>>.
TaskItem: Each node execution is one TaskItem.
TaskItemBuilder: Builder for TaskItem.
TaskResult: The result we get back from a worker.
TaskResultBuilder: Builder for TaskResult.
WorkerPool: A pool of OS threads:
WorkerPoolBuilder: Builder for WorkerPool.

Enums§

AsyncSchedulerBuilderError: Error type for AsyncSchedulerBuilder
AsyncSchedulerConfigBuilderError: Error type for AsyncSchedulerConfigBuilder
BatchingStrategy: A simple enum to govern how nodes are scheduled:
FreedChildReceivedOperationOutcome
InsertOutcome: Outcome when inserting a node index into SharedCompletedNodes.
MockCheckpointCallbackBuilderError: Error type for MockCheckpointCallbackBuilder
ReadyNodeReceivedOperationOutcome
TaskItemBuilderError: Error type for TaskItemBuilder
TaskResultBuilderError: Error type for TaskResultBuilder
WorkerPoolBuilderError: Error type for WorkerPoolBuilder

Traits§

CheckpointCallback: A trait invoked periodically to record checkpoint data. This can write partial progress to disk, a DB, etc.

Functions§

aggregator_thread_behavior: Aggregator thread that reads incoming TaskItems, distributing them to workers. We now log each phase of the aggregator’s loop in more detail, which can reveal if aggregator is stuck waiting for input or if workers are closed prematurely.
build_and_send_task_result: Builds and sends a TaskResult to the aggregator (or whomever is listening). Added logs detail the error state and the exact moment we send the result.
check_all_nodes_done: Utility function for your scheduling loop, logging the “done_count”.
clear_checkpoint_invocations: Clears the global list, so each test can start fresh.
compute_freed_children
create_worker_channels
current_memory_usage_in_bytes
drain_channel: Drains a Receiver<usize> until None, returning items read.
execute_node: Executes the specified node (node_idx) from net_guard, optionally sends streaming output (if output_tx is Some(...)), and decrements in-degs to compute Freed children. Returns (freed_children, error).
fetch_next_task: Retrieves the next TaskItem from the worker’s channel. We now log more details about whether we’re waiting, whether a task is present, etc. This can help confirm if a worker is truly idle or if a hang might be from unreceived tasks.
get_mock_checkpoint_invocations: Returns the global list of checkpointed nodes, for assertion in tests.
handle_freed_children: Sends Freed children to ready_nodes_tx. This step can be a source of hangs if the channel is at capacity or closed. We add logs at each step to ensure we see exactly where Freed children get queued.
mock_failing_operator_task: Creates a TaskItem that uses a FailingOperator at node_idx. The reason is used for the failing operator’s error message.
mock_permit
mock_task_with_checkpoint
poll_worker_results
process_immediate
process_immediate_freed_child_received
process_immediate_ready_node_received
process_task: Processes the task by locking the network, executing the node operator, and computing Freed children. More detailed logs to help ensure we see exact timing and concurrency behavior.
process_waves: A wave-based scheduling function that:
read_next_wave: Reads the first node from ready_nodes_rx, then drains from child_nodes_rx and ready_nodes_rx.
release_concurrency: Releases any concurrency permit if present. This step is crucial in preventing a hang if a worker forgets to release. Added explicit logs to confirm the permit is dropped.
spawn_aggregator
spawn_aggregator_and_workers: Spawns an aggregator plus N worker threads, returning a Vec of handles.
spawn_worker_thread: Spawns one worker OS thread within the given scope. The worker runs a mini tokio runtime that calls worker_main_loop. Returns a ScopedJoinHandle you can store or join later.
submit_chunk_to_worker_pool: Submits each node in chunk to the worker pool, attempting to acquire a concurrency permit. We now log every step more carefully to aid in debugging concurrency/hangs.
validate_network: Validates the network by locking it and invoking validate(). Returns an error if validation fails.
wait_until_all_tasks_in_chunk_are_done: Wait until all tasks in chunk are done. We do a select! with worker results first, Freed children second, and a small sleep if neither are ready.
worker_main_loop: A worker’s main loop: continuously fetch tasks, process them, and send results. We’ve added extra logs at each stage. If a deadlock or hang occurs, these logs can help identify if the channel is closed, concurrency is blocked, etc.

Type Aliases§

MockCheckpointType
StreamingOutput: Optional streaming outputs from each operator:
StreamingOutputSender

Crate hydro2_async_schedulerCopy item path

Macros§

Structs§

Enums§

Traits§

Functions§

Type Aliases§

Crate hydro2_async_scheduler