1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
//! ### `sea-streamer-file`: File Backend
//!
//! This is very similar to `sea-streamer-stdio`, but the difference is SeaStreamerStdio works in real-time,
//! while `sea-streamer-file` works in real-time and replay. That means, SeaStreamerFile has the ability to
//! traverse a `.ss` (sea-stream) file and seek/rewind to a particular timestamp/offset.
//!
//! In addition, Stdio can only work with UTF-8 text data, while File is able to work with binary data.
//! In Stdio, there is only one Streamer per process. In File, there can be multiple independent Streamers
//! in the same process. Afterall, a Streamer is just a file.
//!
//! The basic idea of SeaStreamerFile is like a `tail -f` with one message per line, with a custom message frame
//! carrying binary payloads. The interesting part is, in SeaStreamer, we do not use delimiters to separate messages.
//! This removes the overhead of encoding/decoding message payloads. But it added some complexity to the file format.
//!
//! The SeaStreamerFile format is designed for efficient fast-forward and seeking. This is enabled by placing an array
//! of Beacons at fixed interval in the file. A Beacon contains a summary of the streams, so it acts like an inplace
//! index. It also allows readers to align with the message boundaries. To learn more about the file format, read
//! [`src/format.rs`](https://github.com/SeaQL/sea-streamer/blob/main/sea-streamer-file/src/format.rs).
//!
//! On top of that, are the high-level SeaStreamer multi-producer, multi-consumer stream semantics, resembling
//! the behaviour of other SeaStreamer backends. In particular, the load-balancing behaviour is same as Stdio,
//! i.e. round-robin.
//!
//! ### Decoder
//!
//! We provide a small utility to decode `.ss` files:
//!
//! ```sh
//! cargo install sea-streamer-file --features=executables --bin ss-decode
//!  # local version
//! alias ss-decode='cargo run --package sea-streamer-file --features=executables --bin ss-decode'
//! ss-decode -- --file <file> --format <format>
//! ```
//!
//! Pro tip: pipe it to `less` for pagination
//!
//! ```sh
//! ss-decode --file mystream.ss | less
//! ```
//!
//! Example `log` format:
//!
//! ```log
//!  # header
//! [2023-06-05T13:55:53.001 | hello | 1 | 0] message-1
//!  # beacon
//! ```
//!
//! Example `ndjson` format:
//!
//! ```json
//! /* header */
//! {"header":{"stream_key":"hello","shard_id":0,"sequence":1,"timestamp":"2023-06-05T13:55:53.001"},"payload":"message-1"}
//! /* beacon */
//! ```
//!
//! There is also a Typescript implementation under [`sea-streamer-file-reader`](https://github.com/SeaQL/sea-streamer/tree/main/sea-streamer-file/sea-streamer-file-reader).
//!
//! ### TODO
//!
//! 1. Resumable: currently unimplemented. A potential implementation might be to commit into a local SQLite database.
//! 2. Sharding: currently it only streams to Shard ZERO.
//! 3. Verify: a utility program to verify and repair SeaStreamer binary file.
mod buffer;
mod consumer;
mod crc;
mod dyn_file;
mod error;
mod file;
pub mod format;
mod messages;
mod producer;
mod sink;
mod source;
mod streamer;
mod surveyor;
mod watcher;

pub use buffer::*;
pub use consumer::*;
pub use dyn_file::*;
pub use error::*;
pub use file::*;
pub use messages::*;
pub use producer::*;
pub use sink::*;
pub use source::*;
pub use streamer::*;
pub use surveyor::*;

pub const DEFAULT_BEACON_INTERVAL: u32 = 1024 * 1024; // 1MB
pub const DEFAULT_FILE_SIZE_LIMIT: u64 = 16 * 1024 * 1024 * 1024; // 16GB
pub const DEFAULT_PREFETCH_MESSAGE: usize = 1000;