dbsp/lib.rs
1//! The `dbsp` crate implements a computational engine for continuous analysis
2//! of changing data. With DBSP, a programmer writes code in terms of
3//! computations on a complete data set, but DBSP implements it incrementally,
4//! meaning that changes to the data set run in time proportional to the size of
5//! the change rather than the size of the data set. This is a major advantage
6//! for applications that work with large data sets that change frequently in
7//! small ways.
8//!
9//! The [`tutorial`] is a good place to start for a guided tour. After that, if
10//! you want to look through the API on your own, the [`circuit`] module is a
11//! reasonable starting point. For complete examples, visit the [`examples`][1]
12//! directory in the DBSP repository.
13//!
14//! [1]: https://github.com/feldera/feldera/tree/main/crates/dbsp/examples
15//!
16//! # Theory
17//!
18//! DBSP is underpinned by a formal theory:
19//!
20//! - [Budiu, Chajed, McSherry, Ryzhyk, Tannen. DBSP: Automatic Incremental View
21//! Maintenance for Rich Query Languages, Conference on Very Large Databases, August
22//! 2023, Vancouver, Canada](https://www.feldera.com/vldb23.pdf)
23//!
24//! - Here is [a presentation about DBSP](https://www.youtube.com/watch?v=iT4k5DCnvPU)
25//! at the 2023 Apache Calcite Meetup.
26//!
27//! The model provides two things:
28//!
29//! 1. **Semantics.** DBSP defines a formal language of streaming operators and
30//! queries built out of these operators, and precisely specifies how these
31//! queries must transform input streams to output streams.
32//!
33//! 2. **Algorithm.** DBSP also gives an algorithm that takes an arbitrary query
34//! and generates an incremental dataflow program that implements this query
35//! correctly (in accordance with its formal semantics) and efficiently. Efficiency
36//! here means, in a nutshell, that the cost of processing a set of
37//! input events is proportional to the size of the input rather than the entire
38//! state of the database.
39//!
40//! # Crate overview
41//!
42//! This crate consists of several layers.
43//!
44//! * [`dynamic`] - Types and traits that support dynamic dispatch. We heavily rely on
45//! dynamic dispatch to limit the amount of monomorphization performed by the compiler when
46//! building complex dataflow graphs, balancing compilation speed and runtime performance.
47//! This module implements the type machinery necessary to support this architecture.
48//!
49//! * [`typed_batch`] - Strongly type wrappers around dynamically typed batches and traces.
50//!
51//! * [`trace`] - This module implements batches and traces, which are core DBSP data structures
52//! that represent tables, indexes and changes to tables and indexes. We provide both in-memory
53//! and persistent batch and trace implementations.
54//!
55//! * [`operator::dynamic`] - Dynamically typed operator API. Operators transform data streams
56//! (usually carrying data in the form of batches and traces). DBSP provides many relational
57//! operators, such as map, filter, aggregate, join, etc. The operator API in this module is
58//! dynamically typed and unsafe.
59//!
60//! * [`operator`] - Statically typed wrappers around the dynamic API in [`operator::dynamic`].
61
62#![allow(clippy::type_complexity)]
63
64// allow referring to self as ::dbsp for macros to work universally (from this crate and from others)
65// see https://github.com/rust-lang/rust/issues/54647
66extern crate self as dbsp;
67
68pub mod dynamic;
69mod error;
70mod hash;
71mod num_entries;
72//mod ref_pair;
73
74pub mod typed_batch;
75
76#[macro_use]
77pub mod circuit;
78pub mod algebra;
79pub mod ir;
80pub mod mimalloc;
81pub mod monitor;
82pub mod operator;
83pub mod profile;
84pub mod storage;
85pub mod time;
86pub mod trace;
87pub mod utils;
88
89#[cfg(feature = "backend-mode")]
90pub mod mono;
91
92pub use crate::{
93 error::{DetailedError, Error},
94 hash::{default_hash, default_hasher},
95 num_entries::NumEntries,
96};
97// // pub use crate::ref_pair::RefPair;
98pub use crate::time::Timestamp;
99
100pub use algebra::{DynZWeight, ZWeight};
101
102pub use circuit::{
103 ChildCircuit, Circuit, CircuitHandle, DBSPHandle, NestedCircuit, RootCircuit, Runtime,
104 RuntimeError, SchedulerError, Stream, WeakRuntime,
105};
106#[cfg(not(feature = "backend-mode"))]
107pub use operator::FilterMap;
108pub use operator::{
109 CmpFunc, OrdPartitionedIndexedZSet, OutputHandle,
110 input::{IndexedZSetHandle, InputHandle, MapHandle, SetHandle, ZSetHandle},
111};
112pub use trace::{DBData, DBWeight, cursor::Position};
113pub use typed_batch::{
114 Batch, BatchReader, FallbackKeyBatch, FallbackValBatch, FallbackWSet, FallbackZSet,
115 FileIndexedWSet, FileIndexedZSet, FileKeyBatch, FileValBatch, FileWSet, FileZSet, IndexedZSet,
116 IndexedZSetReader, OrdIndexedWSet, OrdIndexedZSet, OrdWSet, OrdZSet, Trace, TypedBox, ZSet,
117};
118
119#[cfg(doc)]
120pub mod tutorial;
121
122// TODO: import from `circuit`.
123pub type Scope = u16;