Crate web_archive

Expand description

The purpose of this crate is to download a web page, then download its linked image, Javascript, and CSS resources and embed them in the HTML.

Both async and blocking APIs are provided, making use of reqwest’s support for both. The blocking APIs are enabled with the blocking feature.

§Examples

§Async

use web_archive::archive;

// Fetch page and all its resources
let archive = archive("http://example.com", Default::default())
    .await
    .unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

§Blocking

use web_archive::blocking;

// Fetch page and all its resources
let archive =
    blocking::archive("http://example.com", Default::default()).unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

§Ignore certificate errors (dangerous!)

use web_archive::{archive, ArchiveOptions};

// Fetch page and all its resources
let archive_options = ArchiveOptions {
    accept_invalid_certificates: true,
    ..Default::default()
};
let archive = archive("http://example.com", archive_options)
    .await
    .unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

Re-exports§

pub use error::Error;
pub use page_archive::PageArchive;
pub use parsing::ImageResource;
pub use parsing::Resource;
pub use parsing::ResourceMap;
pub use parsing::ResourceUrl;

Modules§

blocking: Blocking
error: Module for the error parsing functionality
page_archive: Module for the core archiving functionality
parsing: Module for the core parsing functionality

Structs§

ArchiveOptions: Configuration options to control aspects of the archiving behaviour.

Functions§

archive: The async archive function.

Crate web_archiveCopy item path