Expand description
Document Processing for Skills using Vision Models
Implements OpenAI-style document processing by converting PDFs, DOCX, and spreadsheets to rendered images for vision model analysis. This preserves layout, formatting, and visual information that would be lost in text extraction.
§Supported Formats
- PDF: Multi-page documents converted to page-by-page PNGs
- DOCX/DOC: Word documents rendered per-page
- Spreadsheets: Excel/CSV files rendered as visual tables
- Images: Direct vision model processing
§Architecture
Document → Renderer → PNG Images → Vision Model → Structured DataInspired by OpenAI’s implementation in ChatGPT’s Code Interpreter.
Structs§
- Document
Metadata - Document metadata
- Document
Processor - Main document processor
- Document
Processor Config - Document processing configuration
- Image
Dimensions - Image dimensions
- Page
Image - Single page image data
- Processed
Document - Processed document result
Enums§
- Document
Type - Document type classification