Skip to main content

Module gdocs

Module gdocs 

Source
Expand description

Google Docs capture module.

Supports API-based capture of Google Docs documents via the export URL pattern: https://docs.google.com/document/d/{DOCUMENT_ID}/export?format={FORMAT}

§Supported Export Formats

  • html — HTML document (images as base64 data URIs)
  • txt — Plain text
  • md — Markdown (native Google Docs export)
  • pdf — PDF document
  • docx — Microsoft Word document
  • epub — EPUB ebook format

§Example

use web_capture::gdocs;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let url = "https://docs.google.com/document/d/abc123/edit";
    if gdocs::is_google_docs_url(url) {
        let result = gdocs::fetch_google_doc(url, "html", None).await?;
        println!("Content length: {}", result.content.len());
    }
    Ok(())
}

Structs§

ExtractedImage
An image extracted from base64 data URIs in HTML.
GDocsArchiveResult
Result of fetching a Google Doc as an archive.
GDocsResult
Result of fetching a Google Docs document.

Functions§

build_export_url
Build a Google Docs export URL.
create_archive_zip
Create a ZIP archive from a GDocsArchiveResult.
extract_base64_images
Extract base64 data URI images from HTML content.
extract_bearer_token
Extract a Bearer token from an Authorization header value.
extract_document_id
Extract the document ID from a Google Docs URL.
fetch_google_doc
Fetch a Google Docs document via the export URL.
fetch_google_doc_as_archive
Fetch a Google Docs document as a ZIP archive.
fetch_google_doc_as_markdown
Fetch a Google Docs document and convert to Markdown.
is_google_docs_url
Check if a URL is a Google Docs document URL.