Crate web_archive[][src]

The purpose of this crate is to download a web page, then download its linked image, Javascript, and CSS resources and embed them in the HTML.

Both async and blocking APIs are provided, making use of reqwest's support for both. The blocking APIs are enabled with the blocking feature.

Examples

Async

use web_archive::archive;

// Fetch page and all its resources
let archive = archive("http://example.com", Default::default())
    .await
    .unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

Blocking

use web_archive::blocking;

// Fetch page and all its resources
let archive =
    blocking::archive("http://example.com", Default::default()).unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

Ignore certificate errors (dangerous!)

use web_archive::{archive, ArchiveOptions};

// Fetch page and all its resources
let archive_options = ArchiveOptions {
    accept_invalid_certificates: true,
    ..Default::default()
};
let archive = archive("http://example.com", archive_options)
    .await
    .unwrap();

// Embed the resources into the page
let page = archive.embed_resources();
println!("{}", page);

Re-exports

pub use error::Error;
pub use page_archive::PageArchive;
pub use parsing::ImageResource;
pub use parsing::Resource;
pub use parsing::ResourceMap;
pub use parsing::ResourceUrl;

Modules

blocking

Blocking

error

Module for the error parsing functionality

page_archive

Module for the core archiving functionality

parsing

Module for the core parsing functionality

Structs

ArchiveOptions

Configuration options to control aspects of the archiving behaviour.

Functions

archive

The async archive function.