Skip to main content

Module document_processor

Module document_processor 

Source
Expand description

Document Processing for Skills using Vision Models

Implements OpenAI-style document processing by converting PDFs, DOCX, and spreadsheets to rendered images for vision model analysis. This preserves layout, formatting, and visual information that would be lost in text extraction.

§Supported Formats

  • PDF: Multi-page documents converted to page-by-page PNGs
  • DOCX/DOC: Word documents rendered per-page
  • Spreadsheets: Excel/CSV files rendered as visual tables
  • Images: Direct vision model processing

§Architecture

Document → Renderer → PNG Images → Vision Model → Structured Data

Inspired by OpenAI’s implementation in ChatGPT’s Code Interpreter.

Structs§

DocumentMetadata
Document metadata
DocumentProcessor
Main document processor
DocumentProcessorConfig
Document processing configuration
ImageDimensions
Image dimensions
PageImage
Single page image data
ProcessedDocument
Processed document result

Enums§

DocumentType
Document type classification