Crate web_archive

Source
Expand description

The purpose of this crate is to download a web page, then download its linked image, Javascript, and CSS resources and embed them in the HTML.

Both async and blocking APIs are provided, making use of reqwest’s support for both. The blocking APIs are enabled with the blocking feature.

§Examples

§Async

use web_archive::archive;

// Fetch page and all its resources
let archive = archive("http://example.com", Default::default())
    .await
    .unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

§Blocking

use web_archive::blocking;

// Fetch page and all its resources
let archive =
    blocking::archive("http://example.com", Default::default()).unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

§Ignore certificate errors (dangerous!)

use web_archive::{archive, ArchiveOptions};

// Fetch page and all its resources
let archive_options = ArchiveOptions {
    accept_invalid_certificates: true,
    ..Default::default()
};
let archive = archive("http://example.com", archive_options)
    .await
    .unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

Re-exports§

pub use error::Error;
pub use page_archive::PageArchive;
pub use parsing::ImageResource;
pub use parsing::Resource;
pub use parsing::ResourceMap;
pub use parsing::ResourceUrl;

Modules§

blocking
Blocking
error
Module for the error parsing functionality
page_archive
Module for the core archiving functionality
parsing
Module for the core parsing functionality

Structs§

ArchiveOptions
Configuration options to control aspects of the archiving behaviour.

Functions§

archive
The async archive function.