Crate fetcher

Source
Expand description

fetcher is a flexible async framework designed to make it easy to create robust applications for building data pipelines to extract, transform, and deliver data from various sources to diverse destinations. In easier words, it makes it easy to create an app that periodically checks a source, for example a website, for some data, makes it pretty, and sends it to the users.

fetcher is made to be easily extensible to support as many use-cases as possible while providing tools to support most of the common ones out of the box.

§Architecture

At the heart of fetcher is the Task. It represents a specific instance of a data pipeline which consists of 3 main stages:

  • Source: Fetches data from an external source (e.g. HTTP endpoint, email inbox).
  • Action: Applies transformations (filters, modifications, parsing) to the fetched data.
  • Sink: Sends the transformed data to a destinations (e.g. Discord channel, Telegram bot, another program’s stdin).

An Entry is the unit of data flowing through the pipeline. It contains:

  • id: A unique identifier for the entry, used for tracking read/unread status and replies.
  • raw_contents: The raw, untransformed data fetched from the source.
  • msg: A Message that contains the formated and structured data, like title, body, link, that will end up sent to a sink.

A Job is a collections of tasks that are executed together, potentially on a schedule. Jobs can also be run either concurrently or in parallel as a part of a JobGroup.

§Getting started

To use fetcher, you need to add it as a dependency to your Cargo.toml file:

[dependencies]
fetcher = "0.15"
tokio = { version = "1", features = ["full"] }

For the smallest example on how to use fetcher, please see examples/simple.rs.

More complete examples can be found in the examples/ directory. They demonstrate how to:

  • Fetch data from various sources.
  • Transform and filter data using regular expressions, HTML parsing, JSON parsing.
  • Send data to sinks like Telegram and Discord
  • Implement custom sources, actions, sinks
  • Persist the read filter state in an external storage system

§Contributing

Contributions are very welcome! Please feel free to submit a pull request or open issues for any bugs, feature requests, or general feedback.

Re-exports§

pub use crate::job::Job;
pub use crate::task::Task;
pub use either;
pub use url;

Modules§

actions
This module contains all Actions that a list of Entry’s can be run through to view/modify/filter it out
auth
This module contains all external manual authentication implementations. For now it’s just Google OAuth2
ctrl_c_signal
This module contains the CtrlCSignalChannel
entry
This module contains the basic building blog of fetcher - Entry that is passed throughout the program and that all modules either create, modify, or consume
error
This module contains all errors that fetcher can emit
exec
This module contains Exec source and sink. It is re-exported in the crate::sinks and crate::sources modules
external_save
This module contains the ExternalSave trait that implementors can use to add a way to save read filter data and entry to message map externally,
job
This module contains the Job struct and the entryway to the library
maybe_send
This module contains MaybeSend, MaybeSync, and MaybeSendSync traits
read_filter
This module contains the ReadFilter that is used for keeping track of what Entry has been or not been read, including all of its stragedies
scaffold
This module contains a “scaffold”, in other words, functions that pre-configure your application for common uses of fetcher.
sinks
This module contains Sink that can be used to consume a composed Message, as well as the message module itself
sources
This module contains Sources that can fetch data and create new Entries out of it
task
This module contains the basic block of fetcher that is a Task.
utils
Miscellaneous utility extention traits for external types

Structs§

StaticStr
A string that always has a ’static lifetime.