Expand description
The dbsp
crate implements a computational engine for continuous analysis
of changing data. With DBSP, a programmer writes code in terms of
computations on a complete data set, but DBSP implements it incrementally,
meaning that changes to the data set run in time proportional to the size of
the change rather than the size of the data set. This is a major advantage
for applications that work with large data sets that change frequently in
small ways.
The tutorial
is a good place to start for a guided tour. After that, if
you want to look through the API on your own, the circuit
module is a
reasonable starting point. For complete examples, visit the examples
directory in the DBSP repository.
Theory
DBSP is underpinned by a formal theory:
-
Here is a presentation about DBSP at the 2023 Apache Calcite Meetup.
The model provides two things:
-
Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.
-
Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.
Re-exports
pub use crate::time::Timestamp;
pub use algebra::IndexedZSet;
pub use algebra::ZSet;
pub use circuit::ChildCircuit;
pub use circuit::Circuit;
pub use circuit::CircuitHandle;
pub use circuit::DBSPHandle;
pub use circuit::RootCircuit;
pub use circuit::Runtime;
pub use circuit::RuntimeError;
pub use circuit::SchedulerError;
pub use circuit::Stream;
pub use operator::CollectionHandle;
pub use operator::InputHandle;
pub use operator::OutputHandle;
pub use operator::UpsertHandle;
pub use trace::ord::OrdIndexedZSet;
pub use trace::ord::OrdZSet;
pub use trace::DBData;
pub use trace::DBTimestamp;
pub use trace::DBWeight;
pub use trace::Rkyv;
Modules
- This module contains declarations of abstract algebraic concepts: monoids, groups, rings, etc.
- Synchronous circuits over streams.
- Trace monitor that validates event traces from a circuit. It is used to test both the tracing mechanism and the circuit engine itself.
- DBSP stream operators.
- Built-in profiling capabilities.
- Logical time
- Traces
- Developer tutorial
Macros
- Declare an anonymous struct type to be used as a key in the cache and associated value type.
- Create an indexed Z-set with specified elements.
- Create a Z-set with specified elements.
- Create a Z-set with specified elements all with weight 1.
Enums
Traits
- Trait to report object size as the number of entries.
- Trait for types that can be converted into a pair of references
Functions
- Default hashing function used to shard records across workers.