Crate dbsp

source ·
Expand description

The dbsp crate implements a computational engine for continuous analysis of changing data. With DBSP, a programmer writes code in terms of computations on a complete data set, but DBSP implements it incrementally, meaning that changes to the data set run in time proportional to the size of the change rather than the size of the data set. This is a major advantage for applications that work with large data sets that change frequently in small ways.

The tutorial is a good place to start for a guided tour. After that, if you want to look through the API on your own, the circuit module is a reasonable starting point. For complete examples, visit the examples directory in the DBSP repository.

Theory

DBSP is underpinned by a formal theory:

The model provides two things:

  1. Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.

  2. Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.

Re-exports

Modules

  • This module contains declarations of abstract algebraic concepts: monoids, groups, rings, etc.
  • Synchronous circuits over streams.
  • Trace monitor that validates event traces from a circuit. It is used to test both the tracing mechanism and the circuit engine itself.
  • DBSP stream operators.
  • Built-in profiling capabilities.
  • Logical time
  • Traces
  • Developer tutorial

Macros

Enums

Traits

Functions

  • Default hashing function used to shard records across workers.