sea_streamer_file/
lib.rs

1//! ### `sea-streamer-file`: File Backend
2//!
3//! This is very similar to `sea-streamer-stdio`, but the difference is SeaStreamerStdio works in real-time,
4//! while `sea-streamer-file` works in real-time and replay. That means, SeaStreamerFile has the ability to
5//! traverse a `.ss` (sea-stream) file and seek/rewind to a particular timestamp/offset.
6//!
7//! In addition, Stdio can only work with UTF-8 text data, while File is able to work with binary data.
8//! In Stdio, there is only one Streamer per process. In File, there can be multiple independent Streamers
9//! in the same process. Afterall, a Streamer is just a file.
10//!
11//! The basic idea of SeaStreamerFile is like a `tail -f` with one message per line, with a custom message frame
12//! carrying binary payloads. The interesting part is, in SeaStreamer, we do not use delimiters to separate messages.
13//! This removes the overhead of encoding/decoding message payloads. But it added some complexity to the file format.
14//!
15//! The SeaStreamerFile format is designed for efficient fast-forward and seeking. This is enabled by placing an array
16//! of Beacons at fixed interval in the file. A Beacon contains a summary of the streams, so it acts like an inplace
17//! index. It also allows readers to align with the message boundaries. To learn more about the file format, read
18//! [`src/format.rs`](https://github.com/SeaQL/sea-streamer/blob/main/sea-streamer-file/src/format.rs).
19//!
20//! On top of that, are the high-level SeaStreamer multi-producer, multi-consumer stream semantics, resembling
21//! the behaviour of other SeaStreamer backends. In particular, the load-balancing behaviour is same as Stdio,
22//! i.e. round-robin.
23//!
24//! ### Decoder
25//!
26//! We provide a small utility to decode `.ss` files:
27//!
28//! ```sh
29//! cargo install sea-streamer-file --features=executables --bin ss-decode
30//!  # local version
31//! alias ss-decode='cargo run --package sea-streamer-file --features=executables --bin ss-decode'
32//! ss-decode -- --file <file> --format <format>
33//! ```
34//!
35//! Pro tip: pipe it to `less` for pagination
36//!
37//! ```sh
38//! ss-decode --file mystream.ss | less
39//! ```
40//!
41//! Example `log` format:
42//!
43//! ```log
44//!  # header
45//! [2023-06-05T13:55:53.001 | hello | 1 | 0] message-1
46//!  # beacon
47//! ```
48//!
49//! Example `ndjson` format:
50//!
51//! ```json
52//! /* header */
53//! {"header":{"stream_key":"hello","shard_id":0,"sequence":1,"timestamp":"2023-06-05T13:55:53.001"},"payload":"message-1"}
54//! /* beacon */
55//! ```
56//!
57//! There is also a Typescript implementation under [`sea-streamer-file-reader`](https://github.com/SeaQL/sea-streamer/tree/main/sea-streamer-file/sea-streamer-file-reader).
58//!
59//! ### TODO
60//!
61//! 1. Resumable: currently unimplemented. A potential implementation might be to commit into a local SQLite database.
62//! 2. Sharding: currently it only streams to Shard ZERO.
63//! 3. Verify: a utility program to verify and repair SeaStreamer binary file.
64mod buffer;
65mod consumer;
66mod crc;
67mod dyn_file;
68mod error;
69pub mod export;
70mod file;
71pub mod format;
72mod messages;
73mod producer;
74mod sink;
75mod source;
76mod streamer;
77mod surveyor;
78mod watcher;
79
80pub use buffer::*;
81pub use consumer::*;
82pub use dyn_file::*;
83pub use error::*;
84pub use file::*;
85pub use messages::*;
86pub use producer::*;
87pub use sink::*;
88pub use source::*;
89pub use streamer::*;
90pub use surveyor::*;
91
92pub const DEFAULT_BEACON_INTERVAL: u32 = 1024 * 1024; // 1MB
93pub const DEFAULT_FILE_SIZE_LIMIT: u64 = 16 * 1024 * 1024 * 1024; // 16GB
94pub const DEFAULT_PREFETCH_MESSAGE: usize = 1000;
95
96/// Reserved by SeaStreamer. Avoid using this as StreamKey.
97pub const SEA_STREAMER_WILDCARD: &str = "SEA_STREAMER_WILDCARD";