Expand description
§sea-streamer-file
: File Backend
This is very similar to sea-streamer-stdio
, but the difference is SeaStreamerStdio works in real-time,
while sea-streamer-file
works in real-time and replay. That means, SeaStreamerFile has the ability to
traverse a .ss
(sea-stream) file and seek/rewind to a particular timestamp/offset.
In addition, Stdio can only work with UTF-8 text data, while File is able to work with binary data. In Stdio, there is only one Streamer per process. In File, there can be multiple independent Streamers in the same process. Afterall, a Streamer is just a file.
The basic idea of SeaStreamerFile is like a tail -f
with one message per line, with a custom message frame
carrying binary payloads. The interesting part is, in SeaStreamer, we do not use delimiters to separate messages.
This removes the overhead of encoding/decoding message payloads. But it added some complexity to the file format.
The SeaStreamerFile format is designed for efficient fast-forward and seeking. This is enabled by placing an array
of Beacons at fixed interval in the file. A Beacon contains a summary of the streams, so it acts like an inplace
index. It also allows readers to align with the message boundaries. To learn more about the file format, read
src/format.rs
.
On top of that, are the high-level SeaStreamer multi-producer, multi-consumer stream semantics, resembling the behaviour of other SeaStreamer backends. In particular, the load-balancing behaviour is same as Stdio, i.e. round-robin.
§Decoder
We provide a small utility to decode .ss
files:
cargo install sea-streamer-file --features=executables --bin ss-decode
# local version
alias ss-decode='cargo run --package sea-streamer-file --features=executables --bin ss-decode'
ss-decode -- --file <file> --format <format>
Pro tip: pipe it to less
for pagination
ss-decode --file mystream.ss | less
Example log
format:
# header
[2023-06-05T13:55:53.001 | hello | 1 | 0] message-1
# beacon
Example ndjson
format:
/* header */
{"header":{"stream_key":"hello","shard_id":0,"sequence":1,"timestamp":"2023-06-05T13:55:53.001"},"payload":"message-1"}
/* beacon */
There is also a Typescript implementation under sea-streamer-file-reader
.
§TODO
- Resumable: currently unimplemented. A potential implementation might be to commit into a local SQLite database.
- Sharding: currently it only streams to Shard ZERO.
- Verify: a utility program to verify and repair SeaStreamer binary file.
Modules§
- export
- format
- The SeaStreamer file format is a container format designed to be seekable. It does not concerns what format the payload is encoded in. It has internal checksum to ensure integrity. It is a binary file format, but is readable with a plain text editor (if the payload is UTF-8).
Structs§
- Async
File - A minimal wrapper over async runtime’s File.
- Byte
Buffer - A FIFO queue of Bytes.
- File
Connect Options - File
Consumer - File
Consumer Options - FileId
- Basically a file path.
- File
Message Stream - File
Producer - File
Producer Options - File
Reader - A simple buffered and bounded file reader.
The implementation is much simpler than
FileSource
. - File
Sink - Buffered file writer.
- File
Sink Writer - A delegate that impl std::io::Write
- File
Source FileSource
treats files as a live stream of bytes. It will read til the end, and will resume reading when the file grows. It relies onnotify::RecommendedWatcher
, which is the OS’s native notify mechanism. The async API allows you to request how many bytes you need, and it will wait for those bytes to come in a non-blocking fashion.- File
Source Future - File
Streamer - Message
Sink - A high level file writer that mux messages and beacon
- Message
Source - A high level file reader that demux messages and beacon
- Mock
Beacon - Send
Future - Surveyor
- The goal of Surveyor is to find the two closest Beacons that pince our search target. If would be pretty simple, if not for the fact that a given location may not contain a relevant Beacon, which could yield Undecided.
Enums§
- Auto
Stream Reset - Where to start streaming from.
- Bytes
- A blob of bytes; optimized over byte and word.
- Config
Err - DynFile
Source - A runtime adapter of
FileReader
andFileSource
, also able to switch between the two mode of operations dynamically. - DynRead
Future - FileErr
- File
Source Type - Next
Future - Read
From - SeekErr
- Seek
Target - Stream
Mode - Survey
Result
Constants§
- DEFAULT_
BEACON_ INTERVAL - DEFAULT_
FILE_ SIZE_ LIMIT - DEFAULT_
PREFETCH_ MESSAGE - END_
OF_ STREAM - PULSE_
MESSAGE - SEA_
STREAMER_ WILDCARD - Reserved by SeaStreamer. Avoid using this as StreamKey.
Traits§
Functions§
- end_
of_ stream - This can be written to a file to properly end the stream
- is_
end_ of_ stream - query_
streamer - Query info about global Streamer(s) topology