internetarchive-rs 0.1.4

Async Rust client for Internet Archive item metadata, search, uploads, metadata updates, and downloads.
Documentation

internetarchive-rs

CI codecov crates.io docs.rs License

internetarchive-rs is an async Rust client for working with Internet Archive items. It supports public metadata reads, advanced search, authenticated uploads and deletes, metadata updates, public downloads, and higher-level create or upsert workflows.

InternetArchiveClient is the main entrypoint. Use SearchQuery for advanced search, ItemMetadata and UploadSpec to describe uploads, and PatchOperation with MetadataTarget for exact low-level metadata writes. If you want higher-level item creation or updates, use InternetArchiveClient::publish_item and InternetArchiveClient::upsert_item.

Read Example

use internetarchive_rs::{InternetArchiveClient, ItemIdentifier};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = InternetArchiveClient::new()?;
    let identifier = ItemIdentifier::new("xfetch")?;
    let download = client.resolve_download(&identifier, "xfetch.pdf")?;
    assert!(download.url.as_str().ends_with("/download/xfetch/xfetch.pdf"));

    Ok(())
}

Search Example

use internetarchive_rs::{Endpoint, SearchQuery, SortDirection};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let query = SearchQuery::builder("collection:opensource AND mediatype:texts")
        .field("identifier")
        .field("title")
        .rows(5)
        .sort("publicdate", SortDirection::Desc)
        .build();

    let url = query.into_url(Endpoint::default().search_url()?)?;
    assert!(url.as_str().contains("collection%3Aopensource"));
    assert!(url.as_str().contains("sort%5B%5D=publicdate+desc"));

    Ok(())
}

Publish Example

use internetarchive_rs::{
    InternetArchiveClient, ItemIdentifier, ItemMetadata, MediaType, PublishRequest, UploadSpec,
};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = InternetArchiveClient::new()?;
    let upload = UploadSpec::from_path_as("/tmp/build/artifact.tmp", "artifact.txt")?;
    let request = PublishRequest::new(
        ItemIdentifier::new("my-demo-item-2026-04-18")?,
        ItemMetadata::builder()
            .mediatype(MediaType::Texts)
            .title("internetarchive-rs example")
            .description_html("<p>Created from Rust</p>")
            .date("2026-04-18")
            .collection("opensource")
            .publisher("internetarchive-rs")
            .language("eng")
            .rights("CC BY 4.0")
            .build(),
        vec![upload],
    );

    assert!(!client.has_auth());
    assert_eq!(request.identifier.as_str(), "my-demo-item-2026-04-18");
    assert_eq!(request.uploads[0].filename, "artifact.txt");

    Ok(())
}

Low-Level Metadata Patch Example

use internetarchive_rs::{MetadataChange, MetadataTarget, PatchOperation};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let change = MetadataChange::new(
        &MetadataTarget::Metadata,
        vec![PatchOperation::replace("/title", "Updated title")],
    );
    let json = serde_json::to_string(&change)?;
    assert!(json.contains("\"target\":\"metadata\""));
    assert!(json.contains("\"op\":\"replace\""));

    Ok(())
}

Authentication

InternetArchiveClient::new() is enough for public metadata reads, searches, and downloads.

Authenticated write helpers use LOW auth credentials and read these standard environment variables: INTERNET_ARCHIVE_ACCESS_KEY and INTERNET_ARCHIVE_SECRET_KEY. You can create S3 credentials from the official Internet Archive API key page at https://archive.org/account/s3.php.

Identifier Rules

General item identifiers follow the official Internet Archive metadata schema: ASCII letters and digits, underscores, dashes, and periods are allowed. The first character must be a letter or digit. The maximum length is 100 characters. IA-S3 maps items to S3-style buckets when creating new items, so create, publish, and upsert paths that create an item validate a conservative bucket-compatible subset locally before making that create request: 3 to 63 characters, lowercase ASCII letters, digits, periods, and dashes only, starting and ending with a letter or digit, with no adjacent periods, no period next to a dash, and no IPv4-address shape. This bucket-creation check is intentionally narrower than IA's general identifier rules and the Python client's optional S3 identifier validator. Existing-item upload, delete, and upload-limit checks still accept the broader documented item identifier shape and leave any endpoint-specific rejection to IA. Identifier validation failures are returned as InternetArchiveError::Identifier.

Progress Bars

Enable the optional indicatif feature if you want upload and download helpers that update a progress bar:

internetarchive-rs = { version = "0.1.3", features = ["indicatif"] }

The crate re-exports indicatif when that feature is enabled, so you can use internetarchive_rs::indicatif::ProgressBar without adding a separate direct dependency.

Operational Notes

Internet Archive's own upload-limit guidance is inconsistent, so the safest choice is to plan conservatively. The official Uploading - Troubleshooting page, updated on August 2, 2021, says a single file should stay around 500 to 700 GB, recommends keeping an item under 10,000 files and 1 TB total, and notes that the API can technically accept up to 250,000 files. The official Uploading - Tips page, updated on August 25, 2021, instead says there is no hard size or file-count limit, but still recommends staying under 50 GB and 1,000 files per single page. For automated ingest, it is better to treat these pages as operational guidance than as a strict contract.

Visibility is eventually consistent rather than immediate. The official Uploading - A Basic Guide says item creation and follow-on tasks can take seconds, hours, or days depending on the amount and type of uploaded data, and the official Problems or errors and Uploading - Troubleshooting pages mention queued, running, paused, or failed tasks, 503-slowdown-spam responses, temporary read-only item servers, and cases where users are told to wait up to 24 hours before assuming an upload is missing.

On retention, the official Archive.org Information page says uploads are duplicated or backed up at various locations and that the Archive's intention is to store materials in perpetuity. That is a strong preservation statement, but it is not presented as a formal durability or uptime SLA. The official sources linked above do not publish an uptime guarantee. The closest operational reference they provide is archive.org/stats, which is mentioned by the Help Center's Internet Archive Statistics page.