Crate directed

Expand description

§directed

This crate is a Directed-Acyclic-Graph (DAG)-based evaluation system for Rust. It allows you to wrap functions in a way that converts them into stateful Nodes in a graph. These can then be executed in the shortest-path to be able to evaluate one or more output nodes. Inputs and outputs can be cached (memoization), and nodes can have internal state (or not, anything can be stateless as well). Graph connections can be rewired at runtime without the loss of node state.

Here is a visualization of a trivial program structure using this:

flowchart TB
    subgraph Node_1_["Node 1 (TransparentStage)"]
        1_in_input[/"input: i32"\]
        1_out__[\"i32"/]
    end
    subgraph Node_0_["Node 0 (SourceStage)"]
        0_out__[\"i32"/]
    end
    subgraph Node_3_["Node 3 (SinkStage)"]
        3_in_o_input[/"o_input: & i32"\]
        3_in_t_input[/"t_input: & i32"\]
    end
    style Node_3_ stroke:yellow,stroke-width:3;
    subgraph Node_2_["Node 2 (OpaqueStage)"]
        2_in_input[/"input: & i32"\]
        2_out__[\"i32"/]
    end
    0_out__ --> 1_in_input
    0_out__ --> 2_in_input
    1_out__ --> 3_in_t_input
    2_out__ --> 3_in_o_input
    linkStyle 3 stroke:yellow,stroke-width:3;

When possible, the error types in this crate contain a trace of the graph and have the ability to generate a mermaid graph like the above, highlighting areas relevant to the error output. These can be placed into markdown or into an online viewer.

§Current project status

WIP: Things work but quite a few key features are yet to be implemented, and a lot of cleanup/refactoring is still ongoing.

§Core API Concepts

§Stage

A Stage is a wrapped function that can be used to create a Node. Think of a Stage as a definition and a Node as a stateful instantiation.

When a function is annotated with the #[stage] macro, it will be converted to a struct of the same name, and given an implementation of the Stage trait. For this reason, struct naming conventions should be followed rather than function naming conventions:

use directed::*;

#[stage]
fn SimpleStage() -> String {
    String::from("Hello graph!")
}

§Multi-output

Stages can support multiple named outputs by making use of the NodeOutput type and the output macro. This can be used to make connections between specific outputs of one node to specific inputs of another:

use directed::*;

// When multiple outputs exist, they must be specified within 'out'. Syntax is siumilar to typical input arguments.
#[stage(out(output1: u32, output2: String))]
fn MultiOutputStage() -> NodeOutput {
    let output2 = String::from("Hello graph!");
    output! {
        output1: 42,
        // Typical struct creation rules apply, no need to specify the name twice
        output2
    }
}

§Lazy

Stages can be annotated as lazy. This will indicate that it’s node will never be evaluated until a child node needs its output to evaluate. Typical graphs will have multiple lazy nodes, and one or possibly a few non-lazy nodes. A graph with only lazy nodes will do nothing at all:

use directed::*;

#[stage(lazy)]
fn LazyStage() -> String {
    String::from("Hello dependant node!")
}

§Cache Last

Stages can be annotated as cache_last. This will indicate that if reevaluated with identical inputs to the previous evaluation, it will just return cached outputs without rerunning the function:

use directed::*;

// If this is run with 31 as an input twice, "to_string" will not be called the 2nd time.
#[stage(cache_last)]
fn CacheLastStage(num: u32) -> String {
    num.to_string()
}

Preconditions:

All inputs must be PartialEq (compile-time error if condition is not met)
All inputs must be Clone (compile-time error if condition is not met)
Outputs must be Clone UNLESS all connected child nodes take input only by reference (runtime error if neither of these conditions are met)

§Cache All

Stages can be annotated with cache_all. This means that for any previously identical input, return the associated output without reevaluating.

Preconditions:

All previous conditions for cache_last
All inputs must be Hash

§State

Stages can be annotated with state. This will indicate a type that can be used to store internal state for the node. This could also be used to store some kind of configuration for the node that might be modified outside the graph’s evaluation time. State is never accessed by other nodes or transferred throughout the graph in any way:

use directed::*;

// Example state struct (any type can be used)
#[derive(Default)]
struct SomeState {
    num_times_run: u32
}

// Use `state(TypeOfState)` to indicate the usage of state
#[stage(state(SomeState))]
fn StatefulStage() -> String {
    // this will automatically put an '&mut SomeState' in scope called 'state'
    let result = if state.num_times_run == 0 {
        format!("I've never been run!")
    } else {
        format!("I've been run {} times.", state.num_times_run)
    };
    state.num_times_run += 1;
    result
}

It is possible to access or mutate state outside of graph evaluation. See the Registry section for more details.

§Registry

A Registry stores nodes and their state. It’s distinctly seperate from Graph itself which just stores information on how nodes are connected. This come swith a few benefits:

Any number of distinct Graphs can be created for a single Registry. Node state can be reused to evaluate a single graph or among distinct graphs.
To evaluate a graph, an &mut Registry is passed in. Graphs don’t take exclusive ownership of the registry, and are thus stateless.

Here’s an example of creating a registry and adding nodes to it:

use directed::*;

#[stage]
fn SimpleStage() -> String {
    String::from("Hello graph!")
}

fn main() {
    let mut registry = Registry::new();
    // This returns a simple incremented integer ID, which can be used to lookup the node in the registry.
    let node_1 = registry.register(SimpleStage::new());
}

§Node State

As mentioned in the Stage section, nodes can have internal state. When creating a node, the following methods is provided:

let mut registry = Registry::new();
let node_1 = registry.register_with_state(StatefulStage::new(), SomeState { num_times_run: 0 });

Note: If SomeState implements Default, the simple register function can be used instead. If no state is explicitly stated, state will simple be set to ().

State can also be accessed via one of these methods:

let mut registry = Registry::new();
let node_1 = registry.register(StatefulStage::new());
// Get a reference to internal state
println!("Node 1 state: {:?}", registry.state(node_1));
// Get a mutable reference to internal state
registry.state_mut(node_1).num_times_run = 10;

§Graph

Putting it all together, the Graph struct stores node IDs and the connections between the outputs of nodes to the inputs of other nodes. Creating one is easy, and the graph macro exists to make the connections more visually intuitive. See the example below of putting a variety of concepts together and finally making a graph:

use directed::*;

#[stage(lazy, cache_last)]
fn TinyStage1() -> String {
    println!("Running stage 1");
    String::from("This is the output!")
}

#[stage(lazy)]
fn TinyStage2(input: String, input2: String) -> String {
    println!("Running stage 2");
    input.to_uppercase() + " [" + &input2.chars().count().to_string() + " chars in 2nd string]"
}

#[stage]
fn TinyStage3(input: String) {
    println!("Running stage 3");
    assert_eq!("THIS IS THE OUTPUT! [19 chars in 2nd string]", input);
}

fn main() {
    let mut registry = Registry::new();
    let node_1 = registry.register(TinyStage1::new());
    let node_2 = registry.register(TinyStage2::new());
    let node_3 = registry.register(TinyStage3::new());

    // This macro is basic syntax sugar for a few calls.
    let graph = graph! {
        // Nodes that will be a part of the graph must be defined.
        nodes: [node_1, node_2, node_3],
        connections: {
            // The below uses unnamed outputs only. Named outputs can be
            // indicated the same way as named inputs, `node_name: output_name`
            node_1 => node_2: input,
            node_1 => node_2: input2,
            node_2 => node_3: input,
            // It is also possible to make connections between nodes without
            // any data being passed between them by leaving out the names 
            // of both the input and output parameters:
            // `node_name => dependant_node_name`
        }
    }
    .unwrap();

    // This will do the following:
    // - Find the first non-lazy node (node_3).
    // - Recursively evaluate it's parents (so node_3 will request node_2, which will request node_1 twice)
    // - node_1 will evaluate, printing "Running stage 1", and pass a clone of its output to "input" on node_2.
    // - node_1 will not evaluate again, and just pass a clone of its output to "input2" on node_2.
    // - node_2 will evaluate, printing its output then moving (no`t cloning) its output to node_3.
    // - node_3 will evaluate, printing its output that passing the assert successfully.
    graph.execute(&mut registry).unwrap();
}

As stated before, multiple graphs can be created from that same registry, executed in any order.

§Get graph output

Here is a simple example of how to access the outputs of a graph:

use directed::*;

#[stage]
fn TinyStage1() -> String {
    String::from("This is the output!")
}

let mut registry = Registry::new();
let node_1 = registry.register(TinyStage1::new());
let graph = graph! {
    nodes: [node_1],
    connections: {}
}
.unwrap();

graph.execute(&mut registry).unwrap();
let mut outputs = graph.get_output(&mut registry).unwrap();
assert_eq!(
    outputs.take_unnamed::<String>(node_1).unwrap(),
    String::from("This is the output!")
)

See GraphOutput::get, GraphOutput::get_unnamed, GraphOutput::take, GraphOutput::take_unnamed

§Inject input to top-level unconnected nodes

Similar to the above, the “top” of the graph can also be ineracted with outside of the evaluatyion context

use directed::*;

#[stage]
fn TinyStage1(simple_input: String) -> String {
    simple_input.replace("input", "output")
}

let mut registry = Registry::new();
let node_1 = registry.register(TinyStage1::new());
let graph = graph! {
    nodes: [node_1],
    connections: {}
}
.unwrap();

// Note: Inputs have to be referred to by arg name.
// TODO: Implement a few macros to make the above note into something more 
//       elegant, and more consistent between inputs and outputs.
graph.set_input(&mut registry, node_1, "simple_input", String::from("This is the simple input!")).unwrap();
graph.execute(&mut registry).unwrap();
let mut outputs = graph.get_output(&mut registry).unwrap();
assert_eq!(
    outputs.take_unnamed::<String>(node_1).unwrap(),
    String::from("This is the simple output!")
)

§Features

§tokio

The tokio feature adds async evaluation. This simply means that a node will evaluate all of its parents nodes concurrently before evaluating itself. Enabling this kind of execution is simple:

Enable the tokio feature
Wrap your graph in an Arc: let graph = Arc::new(graph);
Instead of calling execute, call execute_async on the Arc<Graph>.

Stages marked async will behave as expected - executing within the async context.

TODO: Add pallatable example. For now, Take a look at this test for an example.

§WIP features/ideas/TODOs

High priority: Handle when a node is unavailable from the registry in async execution (wait until it’s available again)
- There is in general more testing and work needed around this bullet. The registry serves as a node library that hands out nodes to be evaluated - and expects them to be returned when evaluation is done. Right now it will just give up if it attempts to concurrently execute the same node at the same time. Waiting is easy - but there are likely some situations where it CAN be valid to execute the same node at the same time. This will take a bit of plumbing in the proc macro.
Automatic validators to make sure correct input and output types are present if required (right now this would halt graph evaluation mid-way through and give an error, but there’s no reason it can’t do that before even starting evaluation)
Outside of async some Send+Sync bounds can be relaxed, some Arc usage can be replaced with Rc
Improve error system to be cleaner (it works but the different types of errors feel non-intuitive)
A Graph + Registry could be combined to create a Node (with a baked stage). Right now we combine nodes with stages to make the registry, and registries with graphs. If we could instead combine STAGES with graphs, then output a valid registry full of nodes based on that combination, it would avoid the possibility of combining a registry with an invalid graph entirely. (or even, full graph sharding?)
An attribute that makes it serialize the cache and store between runs (this may be out of scope, but if so at least make sure the design doesn’t prohibit someone from doing this).
A way to reset all registry state at once (probably only slightly harder to implement than it was to write this bullet point)
get_output and inject_input could probably be more elegant than they currently are
Make a cool visual “rust playgraph” based on this crate
- Ability to create stages, and compile
- Ability to create nodes from stages, and attach them and execute (without recompiling!)

Macros§

graph: Syntax sugar for building a graph
graph_internal
output: Simple macro to simulate a function that can return mutlieple names outputs

Structs§

Cached: This represents a cached input/output pair TODO: handle state as well
ConnectionInfo: Information about a connection in the graph.
DataLabel: Associates a name with a type, internal detail
EdgeInfo: Information about connections between nodes, purely an implementation detail of the graph.
EdgeNotFoundInGraphError
ErrorWithTrace: Wrapper error type, wraps errors from this crate and stores a graph information with them.
Graph: Directed Acryllic Graph representing the flow of execution in that pipeline. Only operates on index and edge information - doesn’t store actual state.
GraphOutput: Used to represent graph outputs.
GraphTrace: A trace of a graph, containing information about nodes and connections.
Node: Every node wraps a Stage, which is a decorated function that has some number of inputs and some number of outputs.
NodeInfo: Information about a node in the graph.
NodeNotFoundInGraphError
NodeTypeMismatchError
NodesNotFoundError
Registry: A Registry stores each node, its state, and the logical Stage associated with it.

Enums§

EdgeCreationError
EvalStrategy
InjectionError
NodeExecutionError
NodeOutput: Represents one or more outputs of a node. This can be used as a return type to represent a function with multiple outputs. The output macro is the preferred way to construct this in that case. Any more direct interaction with this type is not recommended.
ReevaluationRule
RefType
RegistryError
SetInputError

Traits§

AnyNode: This is used to type-erase a node. It’s public because the macro needs to use this, but there should be no reason anyone should manually implement this.
DowncastEq: Trait used to downcast and compare equality
Stage: Defines all the information about how a stage is handled.

Type Aliases§

NodeId: Used to access nodes within the registry, just a simple usize alias

Attribute Macros§

stage: A macro that wraps a function with the standardized interface: fn fn_name(&mut DataMap, &DataMap) -> Result

Crate directedCopy item path