Expand description
The dbsp
crate implements a computational engine for continuous analysis
of changing data. With DBSP, a programmer writes code in terms of
computations on a complete data set, but DBSP implements it incrementally,
meaning that changes to the data set run in time proportional to the size of
the change rather than the size of the data set. This is a major advantage
for applications that work with large data sets that change frequently in
small ways.
The tutorial
is a good place to start for a guided tour. After that, if
you want to look through the API on your own, the circuit
module is a
reasonable starting point. For complete examples, visit the examples
directory in the DBSP repository.
§Theory
DBSP is underpinned by a formal theory:
-
Here is a presentation about DBSP at the 2023 Apache Calcite Meetup.
The model provides two things:
-
Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.
-
Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.
§Crate overview
This crate consists of several layers.
-
dynamic
- Types and traits that support dynamic dispatch. We heavily rely on dynamic dispatch to limit the amount of monomorphization performed by the compiler when building complex dataflow graphs, balancing compilation speed and runtime performance. This module implements the type machinery necessary to support this architecture. -
typed_batch
- Strongly type wrappers around dynamically typed batches and traces. -
trace
- This module implements batches and traces, which are core DBSP data structures that represent tables, indexes and changes to tables and indexes. We provide both in-memory and persistent batch and trace implementations. -
operator::dynamic
- Dynamically typed operator API. Operators transform data streams (usually carrying data in the form of batches and traces). DBSP provides many relational operators, such as map, filter, aggregate, join, etc. The operator API in this module is dynamically typed and unsafe. -
operator
- Statically typed wrappers around the dynamic API inoperator::dynamic
.
Re-exports§
pub use crate::time::Timestamp;
pub use algebra::DynZWeight;
pub use algebra::ZWeight;
pub use circuit::ChildCircuit;
pub use circuit::Circuit;
pub use circuit::CircuitHandle;
pub use circuit::DBSPHandle;
pub use circuit::RootCircuit;
pub use circuit::Runtime;
pub use circuit::RuntimeError;
pub use circuit::SchedulerError;
pub use circuit::Stream;
pub use operator::input::IndexedZSetHandle;
pub use operator::input::InputHandle;
pub use operator::input::MapHandle;
pub use operator::input::SetHandle;
pub use operator::input::ZSetHandle;
pub use operator::CmpFunc;
pub use operator::FilterMap;
pub use operator::OrdPartitionedIndexedZSet;
pub use operator::OutputHandle;
pub use trace::DBData;
pub use trace::DBWeight;
pub use typed_batch::Batch;
pub use typed_batch::BatchReader;
pub use typed_batch::FallbackKeyBatch;
pub use typed_batch::FallbackValBatch;
pub use typed_batch::FallbackWSet;
pub use typed_batch::FallbackZSet;
pub use typed_batch::FileIndexedWSet;
pub use typed_batch::FileIndexedZSet;
pub use typed_batch::FileKeyBatch;
pub use typed_batch::FileValBatch;
pub use typed_batch::FileWSet;
pub use typed_batch::FileZSet;
pub use typed_batch::IndexedZSet;
pub use typed_batch::OrdIndexedWSet;
pub use typed_batch::OrdIndexedZSet;
pub use typed_batch::OrdWSet;
pub use typed_batch::OrdZSet;
pub use typed_batch::Trace;
pub use typed_batch::TypedBox;
pub use typed_batch::ZSet;
Modules§
- This module contains declarations of abstract algebraic concepts: monoids, groups, rings, etc.
- Synchronous circuits over streams.
- This module contains type and trait declarations that support the DBSP dynamic dispatch architecture.
- Trace monitor that validates event traces from a circuit. It is used to test both the tracing mechanism and the circuit engine itself.
- DBSP stream operator implementations.
- Built-in profiling capabilities.
- Storage APIs for Feldera.
- Logical time
- Traces
- Developer tutorial
- Strongly typed wrappers around dynamically typed batch types.
Macros§
- Declare an anonymous struct type to be used as a key in the cache and associated value type.
- Derive
PartialEq
,Eq
,PartialOrd
,Ord
for trait objects that wrap around concrete types that implement these traits. - Create an indexed Z-set with specified elements.
- Macro to implement
NumEntries
for a scalar type whose size is 1. - Create a Z-set with specified elements.
- Create a Z-set with specified elements all with weight 1.
Enums§
Traits§
- Trait to report object size as the number of entries.
Functions§
- Default hashing function used to shard records across workers.