Crate dbsp

Source
Expand description

The dbsp crate implements a computational engine for continuous analysis of changing data. With DBSP, a programmer writes code in terms of computations on a complete data set, but DBSP implements it incrementally, meaning that changes to the data set run in time proportional to the size of the change rather than the size of the data set. This is a major advantage for applications that work with large data sets that change frequently in small ways.

The tutorial is a good place to start for a guided tour. After that, if you want to look through the API on your own, the circuit module is a reasonable starting point. For complete examples, visit the examples directory in the DBSP repository.

§Theory

DBSP is underpinned by a formal theory:

The model provides two things:

  1. Semantics. DBSP defines a formal language of streaming operators and queries built out of these operators, and precisely specifies how these queries must transform input streams to output streams.

  2. Algorithm. DBSP also gives an algorithm that takes an arbitrary query and generates an incremental dataflow program that implements this query correctly (in accordance with its formal semantics) and efficiently. Efficiency here means, in a nutshell, that the cost of processing a set of input events is proportional to the size of the input rather than the entire state of the database.

§Crate overview

This crate consists of several layers.

  • dynamic - Types and traits that support dynamic dispatch. We heavily rely on dynamic dispatch to limit the amount of monomorphization performed by the compiler when building complex dataflow graphs, balancing compilation speed and runtime performance. This module implements the type machinery necessary to support this architecture.

  • typed_batch - Strongly type wrappers around dynamically typed batches and traces.

  • trace - This module implements batches and traces, which are core DBSP data structures that represent tables, indexes and changes to tables and indexes. We provide both in-memory and persistent batch and trace implementations.

  • operator::dynamic - Dynamically typed operator API. Operators transform data streams (usually carrying data in the form of batches and traces). DBSP provides many relational operators, such as map, filter, aggregate, join, etc. The operator API in this module is dynamically typed and unsafe.

  • operator - Statically typed wrappers around the dynamic API in operator::dynamic.

Re-exports§

pub use crate::time::Timestamp;
pub use algebra::DynZWeight;
pub use algebra::ZWeight;
pub use circuit::ChildCircuit;
pub use circuit::Circuit;
pub use circuit::CircuitHandle;
pub use circuit::DBSPHandle;
pub use circuit::NestedCircuit;
pub use circuit::RootCircuit;
pub use circuit::Runtime;
pub use circuit::RuntimeError;
pub use circuit::SchedulerError;
pub use circuit::Stream;
pub use operator::input::IndexedZSetHandle;
pub use operator::input::InputHandle;
pub use operator::input::MapHandle;
pub use operator::input::SetHandle;
pub use operator::input::ZSetHandle;
pub use operator::CmpFunc;
pub use operator::OrdPartitionedIndexedZSet;
pub use operator::OutputHandle;
pub use trace::DBData;
pub use trace::DBWeight;
pub use typed_batch::Batch;
pub use typed_batch::BatchReader;
pub use typed_batch::FallbackKeyBatch;
pub use typed_batch::FallbackValBatch;
pub use typed_batch::FallbackWSet;
pub use typed_batch::FallbackZSet;
pub use typed_batch::FileIndexedWSet;
pub use typed_batch::FileIndexedZSet;
pub use typed_batch::FileKeyBatch;
pub use typed_batch::FileValBatch;
pub use typed_batch::FileWSet;
pub use typed_batch::FileZSet;
pub use typed_batch::IndexedZSet;
pub use typed_batch::OrdIndexedWSet;
pub use typed_batch::OrdIndexedZSet;
pub use typed_batch::OrdWSet;
pub use typed_batch::OrdZSet;
pub use typed_batch::Trace;
pub use typed_batch::TypedBox;
pub use typed_batch::ZSet;

Modules§

algebra
This module contains declarations of abstract algebraic concepts: monoids, groups, rings, etc.
circuit
Synchronous circuits over streams.
dynamic
This module contains type and trait declarations that support the DBSP dynamic dispatch architecture.
ir
mimalloc
monitor
Trace monitor that validates event traces from a circuit. It is used to test both the tracing mechanism and the circuit engine itself.
mono
Monomorphic versions of DBSP operators for use by the SQL compiler.
operator
DBSP stream operator implementations.
profile
Built-in profiling capabilities.
storage
Storage APIs for Feldera.
time
Logical time
trace
Traces
tutorial
Developer tutorial
typed_batch
Strongly typed wrappers around dynamically typed batch types.
utils

Macros§

circuit_cache_key
Declare an anonymous struct type to be used as a key in the cache and associated value type.
circuit_cache_key_unsized
count_items
declare_trait_object
declare_trait_object_with_archived
declare_tuples
declare_typed_trait_object
derive_comparison_traits
Derive PartialEq, Eq, PartialOrd, Ord for trait objects that wrap around concrete types that implement these traits.
derive_erase
indexed_zset
Create an indexed Z-set with specified elements.
lean_vec
measure_items
metadata
num_entries_scalar
Macro to implement NumEntries for a scalar type whose size is 1.
zset
Create a Z-set with specified elements.
zset_set
Create a Z-set with specified elements all with weight 1.

Enums§

Error

Traits§

DetailedError
NumEntries
Trait to report object size as the number of entries.

Functions§

default_hash
Default hashing function used to shard records across workers.

Type Aliases§

Scope