Expand description
This crate enables you to create nodes for the Dora dataflow framework.
§The Dora Framework
Dora is a dataflow frame work that models applications as a directed graph, with nodes representing operations and edges representing data transfer. The layout of the dataflow graph is defined through a YAML file in Dora. For details, see our Dataflow Specification chapter.
Dora nodes are typically spawned by the Dora framework, instead of spawning them manually. If you want to spawn a node manually, define it as a dynamic node.
§Normal Usage
In order to connect your executable to Dora, you need to initialize a DoraNode
.
For standard nodes, the recommended initialization function is init_from_env
.
This function will return two values, a DoraNode
instance and an EventStream
:
use dora_node_api::DoraNode;
let (mut node, mut events) = DoraNode::init_from_env()?;
You can use the node
instance to send outputs and retrieve information about the node and
the dataflow. The events
stream yields the inputs that the node defines in the dataflow
YAML file and other incoming events.
§Sending Outputs
The DoraNode
instance enables you to send outputs in different formats.
For best performance, use the Arrow data format
and one of the output functions that utilizes shared memory.
§Receiving Events
The EventStream
is an AsyncIterator
that yields the incoming Event
s.
Nodes should iterate over this event stream and react to events that they are interested in.
Typically, the most important event type is Event::Input
.
You don’t need to handle all events, it’s fine to ignore events that are not relevant to your node.
The event stream will close itself after a Event::Stop
was received.
A manual break
on Event::Stop
is typically not needed.
(You probably do need to use a manual break
on stop events when using the
StreamExt::merge
implementation on
EventStream
to combine the stream with an external one.)
Once the event stream finished, nodes should exit.
Note that Dora kills nodes that don’t exit quickly after a Event::Stop
of type
StopCause::Manual
was received.
§Dynamic Nodes
Dynamic nodes have certain limitations. Use with care.
Nodes can be defined as dynamic
by setting their path
attribute to dynamic
in the
dataflow YAML file. Dynamic nodes are not spawned by the Dora framework and need to be started
manually.
Dynamic nodes cannot use the DoraNode::init_from_env
function for initialization.
Instead, they can be initialized through the DoraNode::init_from_node_id
function.
§Limitations
- Dynamic nodes don’t work with
dora run
. - As dynamic nodes are identified by their node ID, this ID must be unique across all running dataflows.
- For distributed dataflows, nodes need to be manually spawned on the correct machine.
Re-exports§
Modules§
- arrow_
utils - Utility functions for converting Arrow arrays to/from raw data.
- merged
- Merge external stream into an
EventStream
.
Structs§
- Arrow
Data - Wrapper type for an Arrow
ArrayRef
. - Data
Sample - A data region suitable for sending as an output message.
- Dora
Node - Allows sending outputs and retrieving node information.
- Event
Scheduler - This scheduler will make sure that there is fairness between inputs.
- Event
Stream - Asynchronous iterator over the incoming
Event
s destined for this node. - Metadata
- Additional data that is sent as part of output messages.
- Receiver
- The receiving end of a channel.
Enums§
- Event
- Represents an incoming Dora event.
- Parameter
- A metadata parameter that can be sent as part of output messages.
- Stop
Cause - The reason for closing the event stream.
Constants§
- ZERO_
COPY_ THRESHOLD - The data size threshold at which we start using shared memory.
Traits§
- Into
Arrow - Data that can be converted to an Arrow array.
Functions§
- into_
vec - Tries to convert the given Arrow array into a
Vec
of integers or floats.
Type Aliases§
- Dataflow
Id - Unique identifier for a dataflow instance.
- Metadata
Parameters - Additional metadata that can be sent as part of output messages.