Expand description
Google Docs capture module.
Supports API-based capture of Google Docs documents via the export URL pattern:
https://docs.google.com/document/d/{DOCUMENT_ID}/export?format={FORMAT}
§Supported Export Formats
html— HTML document (images as base64 data URIs)txt— Plain textmd— Markdown (native Google Docs export)pdf— PDF documentdocx— Microsoft Word documentepub— EPUB ebook format
§Example
use web_capture::gdocs;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let url = "https://docs.google.com/document/d/abc123/edit";
if gdocs::is_google_docs_url(url) {
let result = gdocs::fetch_google_doc(url, "html", None).await?;
println!("Content length: {}", result.content.len());
}
Ok(())
}Structs§
- Captured
Document - Parsed Google Docs model/document capture.
- Extracted
Image - An image extracted from base64 data URIs in HTML.
- GDocs
Archive Result - Result of fetching a Google Doc as an archive.
- GDocs
Export Preprocess Result - Result of running the Google Docs export HTML pre-processor.
- GDocs
Rendered Output - Rendered document output.
- GDocs
Rendered Result - Rendered Google Docs content from either Docs API or editor model data.
- GDocs
Result - Result of fetching a Google Docs document.
- List
Meta - Remote
Image - Remote image reference extracted from browser-model capture.
- Table
Block - Captured table.
- Table
Cell - Captured table cell.
- Table
Row - Captured table row.
Enums§
- Captured
Block - Captured block.
- Content
Node - Captured inline content node.
- GDocs
Capture Method - Google Docs capture backend selected from the CLI
--captureflag.
Functions§
- build_
docs_ api_ url - Build a Google Docs REST API URL.
- build_
edit_ url - Build a Google Docs editor URL.
- build_
export_ url - Build a Google Docs export URL.
- create_
archive_ zip - Create a ZIP archive from a
GDocsArchiveResult. - extract_
base64_ images - Extract base64 data URI images from HTML content.
- extract_
bearer_ token - Extract a Bearer token from an Authorization header value.
- extract_
document_ id - Extract the document ID from a Google Docs URL.
- fetch_
google_ doc - Fetch a Google Docs document via the export URL.
- fetch_
google_ doc_ as_ archive - Fetch a Google Docs document as a ZIP archive.
- fetch_
google_ doc_ as_ markdown - Fetch a Google Docs document and convert to Markdown.
- fetch_
google_ doc_ from_ docs_ api - Fetch and render a Google Docs document via the authenticated REST API.
- fetch_
google_ doc_ from_ model - Fetch and render the model data embedded in the Google Docs
/editroute. - is_
google_ docs_ url - Check if a URL is a Google Docs document URL.
- localize_
rendered_ remote_ images_ for_ archive - Build a self-contained archive result from browser-model rendered output.
- normalize_
google_ docs_ export_ markdown - Normalize Markdown emitted from Google Docs public-export HTML converters.
- parse_
model_ chunks - Parse captured
DOCS_modelChunkvalues. - parse_
model_ chunks_ with_ export_ html - Parse captured
DOCS_modelChunkvalues and optionally merge semantic hints from Google Docs export HTML. - preprocess_
google_ docs_ export_ html - Pre-process Google Docs export HTML so the generic
html2mdpipeline preserves inline formatting, heading numbering, and link targets. - render_
captured_ document - Render a parsed Google Docs capture as Markdown, HTML, or text.
- render_
docs_ api_ document - Render a Google Docs REST API document value.
- select_
capture_ method - Select a Google Docs capture backend from the CLI
--capturevalue.