Skip to main content

Crate octo_flow

Crate octo_flow 

Source
Expand description

Core library for the octo-flow GitHub event processing pipeline.

octo-flow is a streaming CLI tool for processing GitHub Archive (GHArchive) event datasets, which are distributed as newline-delimited JSON (NDJSON).

The library provides the core event-processing pipeline used by the CLI. It reads events from an input source, parses them using serde_json, optionally filters them by event type, and outputs selected fields in a tab-separated format.

§Architecture

The processing pipeline follows a streaming architecture designed to handle large datasets efficiently:

input source (file or stdin)
       ↓
    BufReader
       ↓
  line-by-line iterator
       ↓
  serde_json parsing
       ↓
  optional event filtering
       ↓
  tab-separated output

This approach ensures:

  • constant memory usage
  • fast startup time
  • efficient processing of large NDJSON datasets

§Example

use octo_flow::run;

// Process a GitHub event file without filtering
run("events.json".to_string(), None).unwrap();

// Process only PushEvent events
run("events.json".to_string(), Some("PushEvent".to_string())).unwrap();

§CLI Usage

octo-flow --input events.json --event PushEvent

Modules§

github_event

Enums§

OctoFlowError
Error type for the octo-flow processing pipeline.

Functions§

process_events
Process a stream of GitHub events.
run
Entry point for the event processing pipeline.