Crate hydro2_async_scheduler

Source

Macros§

task_item
try_build_async_scheduler_config
A macro to build an AsyncSchedulerConfig without using .unwrap() or .expect(). It returns a Result<AsyncSchedulerConfig, NetworkError> so you can use ?.

Structs§

AsyncScheduler
AsyncSchedulerBuilder
Builder for AsyncScheduler.
AsyncSchedulerConfig
Configuration for our upgraded asynchronous scheduler.
AsyncSchedulerConfigBuilder
Builder for AsyncSchedulerConfig.
InFlightCounter
Tracks the in-flight tasks: how many nodes are actively being processed by workers.
MockCheckpointCallback
A mock checkpoint callback that records each checkpoint invocation.
MockCheckpointCallbackBuilder
Builder for MockCheckpointCallback.
NoOpCheckpointCallback
SharedCompletedNodes
A concurrency‐safe set of completed node indices, using tokio::sync::AsyncMutex<HashSet<usize>>.
TaskItem
Each node execution is one TaskItem.
TaskItemBuilder
Builder for TaskItem.
TaskResult
The result we get back from a worker.
TaskResultBuilder
Builder for TaskResult.
WorkerPool
A pool of OS threads:
WorkerPoolBuilder
Builder for WorkerPool.

Enums§

AsyncSchedulerBuilderError
Error type for AsyncSchedulerBuilder
AsyncSchedulerConfigBuilderError
Error type for AsyncSchedulerConfigBuilder
BatchingStrategy
A simple enum to govern how nodes are scheduled:
FreedChildReceivedOperationOutcome
InsertOutcome
Outcome when inserting a node index into SharedCompletedNodes.
MockCheckpointCallbackBuilderError
Error type for MockCheckpointCallbackBuilder
ReadyNodeReceivedOperationOutcome
TaskItemBuilderError
Error type for TaskItemBuilder
TaskResultBuilderError
Error type for TaskResultBuilder
WorkerPoolBuilderError
Error type for WorkerPoolBuilder

Traits§

CheckpointCallback
A trait invoked periodically to record checkpoint data. This can write partial progress to disk, a DB, etc.

Functions§

aggregator_thread_behavior
Aggregator thread that reads incoming TaskItems, distributing them to workers. We now log each phase of the aggregator’s loop in more detail, which can reveal if aggregator is stuck waiting for input or if workers are closed prematurely.
build_and_send_task_result
Builds and sends a TaskResult to the aggregator (or whomever is listening). Added logs detail the error state and the exact moment we send the result.
check_all_nodes_done
Utility function for your scheduling loop, logging the “done_count”.
clear_checkpoint_invocations
Clears the global list, so each test can start fresh.
compute_freed_children
create_worker_channels
current_memory_usage_in_bytes
drain_channel
Drains a Receiver<usize> until None, returning items read.
execute_node
Executes the specified node (node_idx) from net_guard, optionally sends streaming output (if output_tx is Some(...)), and decrements in-degs to compute Freed children. Returns (freed_children, error).
fetch_next_task
Retrieves the next TaskItem from the worker’s channel. We now log more details about whether we’re waiting, whether a task is present, etc. This can help confirm if a worker is truly idle or if a hang might be from unreceived tasks.
get_mock_checkpoint_invocations
Returns the global list of checkpointed nodes, for assertion in tests.
handle_freed_children
Sends Freed children to ready_nodes_tx. This step can be a source of hangs if the channel is at capacity or closed. We add logs at each step to ensure we see exactly where Freed children get queued.
mock_failing_operator_task
Creates a TaskItem that uses a FailingOperator at node_idx. The reason is used for the failing operator’s error message.
mock_permit
mock_task_with_checkpoint
poll_worker_results
process_immediate
process_immediate_freed_child_received
process_immediate_ready_node_received
process_task
Processes the task by locking the network, executing the node operator, and computing Freed children. More detailed logs to help ensure we see exact timing and concurrency behavior.
process_waves
A wave-based scheduling function that:
read_next_wave
Reads the first node from ready_nodes_rx, then drains from child_nodes_rx and ready_nodes_rx.
release_concurrency
Releases any concurrency permit if present. This step is crucial in preventing a hang if a worker forgets to release. Added explicit logs to confirm the permit is dropped.
spawn_aggregator
spawn_aggregator_and_workers
Spawns an aggregator plus N worker threads, returning a Vec of handles.
spawn_worker_thread
Spawns one worker OS thread within the given scope. The worker runs a mini tokio runtime that calls worker_main_loop. Returns a ScopedJoinHandle you can store or join later.
submit_chunk_to_worker_pool
Submits each node in chunk to the worker pool, attempting to acquire a concurrency permit. We now log every step more carefully to aid in debugging concurrency/hangs.
validate_network
Validates the network by locking it and invoking validate(). Returns an error if validation fails.
wait_until_all_tasks_in_chunk_are_done
Wait until all tasks in chunk are done. We do a select! with worker results first, Freed children second, and a small sleep if neither are ready.
worker_main_loop
A worker’s main loop: continuously fetch tasks, process them, and send results. We’ve added extra logs at each stage. If a deadlock or hang occurs, these logs can help identify if the channel is closed, concurrency is blocked, etc.

Type Aliases§

MockCheckpointType
StreamingOutput
Optional streaming outputs from each operator:
StreamingOutputSender