article_extractor/
lib.rs

1//! # article scraper
2//!
3//! The `article_scraper` crate provides a simple way to extract meaningful content from the web.
4//! It contains two ways of locating the desired content
5//!
6//! ## 1. Rust implementation of [Full-Text RSS](https://www.fivefilters.org/full-text-rss/)
7//!
8//! This makes use of website specific extraction rules. Which has the advantage of fast & accurate results.
9//! The disadvantages however are: the config needs to be updated as the website changes and a new extraction rule is needed for every website.
10//!
11//! A central repository of extraction rules and information about writing your own rules can be found here: [ftr-site-config](https://github.com/fivefilters/ftr-site-config).
12//! Please consider contributing new rules or updates to it.
13//!
14//! `article_scraper` embeds all the rules in the ftr-site-config repository for convenience. Custom and updated rules can be loaded from a `user_configs` path.
15//!
16//! ## 2. Mozilla Readability
17//!
18//! In case the ftr-config based extraction fails the [mozilla Readability](https://github.com/mozilla/readability) algorithm will be used as a fall-back.
19//! This re-implementation tries to mimic the original as closely as possible.
20
21mod article;
22pub mod clean;
23mod constants;
24mod error;
25mod full_text_parser;
26mod image_object;
27mod util;
28mod video_object;
29
30pub use article::Article;
31#[doc(hidden)]
32pub use full_text_parser::config::ConfigEntry as FtrConfigEntry;
33#[doc(hidden)]
34pub use full_text_parser::FullTextParser;
35pub use full_text_parser::Readability;