timely 0.0.10

A low-latency data-parallel dataflow system in Rust
docs.rs failed to build timely-0.0.10
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: timely-0.12.0

Timely Dataflow

Timely dataflow is a low-latency cyclic dataflow computational model, introduced in the paper Naiad: a timely datalow system. This project is an extended and more modular implementation of timely dataflow in Rust.

Be sure to read the documentation for timely dataflow. It is a work in progress, but mostly improving.

The timely dataflow wiki has more long-form text, introducing programming and explaining concepts in more detail.

An example

To use timely dataflow, add the following to the dependencies section of your project's Cargo.toml file:

[dependencies]
timely="*"

This will bring in the timely crate from crates.io, which should allow you to start writing timely dataflow programs like this one (also available in examples/hello.rs):

extern crate timely;

use timely::dataflow::*;
use timely::dataflow::operators::{Input, Inspect};

fn main() {

    // initializes and runs a timely dataflow computation
    timely::execute_from_args(std::env::args(), |root| {

        // create a new input, and inspect its output
        let mut input = root.scoped(move |scope| {
            let (input, stream) = scope.new_input();
            stream.inspect(|x| println!("hello {:?}", x));
            input
        });

        // introduce data and watch!
        for round in 0..10 {
            input.push(round);
            input.advance_to(round + 1);
            root.step();
        }

        // seal the input
        input.close();

        // finish off any remaining work
        while root.step() { }

    });
}

You can run this example from the root directory of the timely-dataflow repository by typing

% cargo run --example hello
Running `target/debug/examples/hello`
hello 0
hello 1
hello 2
hello 3
hello 4
hello 5
hello 6
hello 7
hello 8
hello 9

Execution

A program like that written above can be run, and will by default use a single worker thread.

To use multiple threads in a process, use the -w or --workers options followed by the number of threads you would like to use.

To use multiple processes, you will need to use the -h or --hostfile option to specify a text file whose lines are hostname:port entries corresponding to the locations you plan on spawning the processes. You will need to use the -n or --processes argument to indicate how many processes you will spawn (a prefix of the host file), and each process must use the -p or --process argument to indicate their index out of this number.

Said differently, you want a hostfile that looks like so,

% cat hostfile.txt
host0:port
host1:port
host2:port
host3:port
...

and then to launch the processes like so:

host0% cargo run -w 2 -h hostfile.txt -n 4 -p 0
host1% cargo run -w 2 -h hostfile.txt -n 4 -p 1
host2% cargo run -w 2 -h hostfile.txt -n 4 -p 2
host3% cargo run -w 2 -h hostfile.txt -n 4 -p 3

The number of workers should be the same for each process.

The ecosystem

Timely dataflow is intended to support multiple levels of abstraction, from the lowest level manual dataflow assembly, to higher level "declarative" abstractions.

There are currently a few options for writing timely dataflow programs. Ideally this set will expand with time, as interested people write their own layers (or build on those of others).

  • Timely dataflow: Timely dataflow includes several primitive operators, including standard operators like map, filter, and concat. It also including more exotic operators for tasks like entering and exiting loops (enter and leave), as well as generic operators whose implementations can be supplied using closures (unary and binary).

  • Differential dataflow: A higher-level language built on timely dataflow, differential dataflow includes operators like group_by, join, and iterate. Its implementation is fully incrementalized, and the details are pretty cool (if mysterious).

There are also a few application built on timely dataflow, including a streaming worst-case optimal join implementation and a PageRank implementation, both of which should provide helpful examples of writing timely dataflow programs.