Expand description
Advanced content processing for multiple document formats
This module provides comprehensive document parsing and content extraction capabilities for PDF, HTML, XML, office documents, and multimedia content.
This module is only available when the content-processing feature is enabled.
Structs§
- Audio
Energy Metrics - Audio energy and loudness metrics
- Audio
Features - Audio feature extraction results
- Audio
Handler - Audio handler for various audio formats (MP3, WAV, OGG, FLAC, etc.)
- Classification
Label - Classification label with confidence
- Content
Chunk - Content chunk for embedding
- Content
Extraction Config - Content extraction configuration
- Content
Location - Content location information
- Content
Processor - Advanced content processor
- Cross
Modal Embedding - Cross-modal embedding that combines multiple modalities
- CsvHandler
- CSV handler
- Detected
Object - Object detection result
- Document
Structure - Document structure information
- Docx
Handler - DOCX document handler
- Extracted
Audio - Extracted audio information
- Extracted
Content - Extracted document content
- Extracted
Image - Extracted image information
- Extracted
Link - Extracted link information
- Extracted
Table - Extracted table information
- Extracted
Video - Extracted video information
- Fallback
Handler - Fallback handler for unsupported formats
- Heading
- Heading information
- Html
Handler - HTML handler
- Image
Complexity Metrics - Image complexity metrics
- Image
Features - Image feature extraction results
- Image
Handler - Image handler for various image formats (JPEG, PNG, GIF, WebP, etc.)
- Json
Handler - JSON handler
- Markdown
Handler - Markdown handler
- Motion
Analysis - Motion analysis for video
- Music
Analysis - Music analysis results
- Object
Motion - Object motion tracking result
- PdfHandler
- PDF document handler
- Pitch
Statistics - Pitch statistics for speech analysis
- Plain
Text Handler - Plain text handler
- Pptx
Handler - PPTX document handler
- Processing
Stats - Processing statistics
- Speech
Analysis - Speech analysis results
- TocEntry
- Table of contents entry
- Video
Analysis - Video analysis results
- Video
Handler - Video handler for various video formats (MP4, AVI, MKV, WebM, etc.)
- Video
Keyframe - Video keyframe information
- Video
Scene - Video scene detection result
- Xlsx
Handler - XLSX document handler
- XmlHandler
- XML handler
Enums§
- Chunk
Type - Content chunk types
- Chunking
Strategy - Content chunking strategies
- Document
Format - Document format types supported by the content processor
- Fusion
Strategy - Fusion strategies for combining modalities
- Modality
- Modality types for cross-modal processing
Traits§
- Format
Handler - Trait for format-specific content handlers
- Office
Document Handler - Base handler for Office documents (DOCX, PPTX, XLSX)
Type Aliases§
- Color
Timeline Entry - Type alias for color timeline entries (timestamp, dominant_colors)