Fluxus Source Gharchive
A Fluxus source component for processing and analyzing GitHub Archive data streams, providing efficient access to historical GitHub event data.
Overview
fluxus-source-gharchive is a powerful Rust library that enables seamless integration with GitHub Archive data. It provides a robust interface for streaming and processing historical GitHub events, supporting both HTTP-based remote access and local file processing.
Features
-
Flexible Data Source Support
- HTTP streaming from gharchive.org
- Local file processing for offline analysis
- Automatic handling of gzip compression
-
Advanced Time Range Control
- Date-based data retrieval (YYYY-MM-DD format)
- Hour-specific data access (0-23 hour range)
- Configurable date ranges with start and end dates
-
Comprehensive Event Data
- Full GitHub event information including:
- Event type and ID
- Repository details
- Actor information
- Organization data
- Event payload
- Timestamps
- Full GitHub event information including:
-
Robust Error Handling
- Configurable I/O timeouts
- Detailed error reporting
- Stream-based error handling
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Usage
Basic HTTP Source
use GithubArchiveSource;
use Source;
async
Date Range Processing
use GithubArchiveSource;
use Source;
async
Local File Processing
use GithubArchiveSource;
use Source;
use Path;
async
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.