Module document_processor

Expand description

Document Processing for Skills using Vision Models

Implements OpenAI-style document processing by converting PDFs, DOCX, and spreadsheets to rendered images for vision model analysis. This preserves layout, formatting, and visual information that would be lost in text extraction.

§Supported Formats

PDF: Multi-page documents converted to page-by-page PNGs
DOCX/DOC: Word documents rendered per-page
Spreadsheets: Excel/CSV files rendered as visual tables
Images: Direct vision model processing

§Architecture

Document → Renderer → PNG Images → Vision Model → Structured Data

Inspired by OpenAI’s implementation in ChatGPT’s Code Interpreter.

Structs§

DocumentMetadata: Document metadata
DocumentProcessor: Main document processor
DocumentProcessorConfig: Document processing configuration
ImageDimensions: Image dimensions
PageImage: Single page image data
ProcessedDocument: Processed document result

Enums§

DocumentType: Document type classification

Module document_processor

Module document_processor Copy item path

§Supported Formats

§Architecture

Structs§

Enums§

Module document_processor