Expand description
Core library for the octo-flow GitHub event processing pipeline.
octo-flow is a streaming CLI tool for processing GitHub Archive
(GHArchive) event datasets, which are distributed as
newline-delimited JSON (NDJSON).
The library provides the core event-processing pipeline used by the CLI.
It reads events from an input source, parses them using serde_json,
optionally filters them by event type, and outputs selected fields
in a tab-separated format.
§Architecture
The processing pipeline follows a streaming architecture designed to handle large datasets efficiently:
input source (file or stdin)
↓
BufReader
↓
line-by-line iterator
↓
serde_json parsing
↓
optional event filtering
↓
tab-separated outputThis approach ensures:
- constant memory usage
- fast startup time
- efficient processing of large NDJSON datasets
§Example
use octo_flow::run;
// Process a GitHub event file without filtering
run("events.json".to_string(), None).unwrap();
// Process only PushEvent events
run("events.json".to_string(), Some("PushEvent".to_string())).unwrap();§CLI Usage
octo-flow --input events.json --event PushEventModules§
Enums§
- Octo
Flow Error - Error type for the
octo-flowprocessing pipeline.
Functions§
- process_
events - Process a stream of GitHub events.
- run
- Entry point for the event processing pipeline.