Module content_processing

Module content_processing 

Source
Expand description

Advanced content processing for multiple document formats

This module provides comprehensive document parsing and content extraction capabilities for PDF, HTML, XML, office documents, and multimedia content.

This module is only available when the content-processing feature is enabled.

Structs§

AudioEnergyMetrics
Audio energy and loudness metrics
AudioFeatures
Audio feature extraction results
AudioHandler
Audio handler for various audio formats (MP3, WAV, OGG, FLAC, etc.)
ClassificationLabel
Classification label with confidence
ContentChunk
Content chunk for embedding
ContentExtractionConfig
Content extraction configuration
ContentLocation
Content location information
ContentProcessor
Advanced content processor
CrossModalEmbedding
Cross-modal embedding that combines multiple modalities
CsvHandler
CSV handler
DetectedObject
Object detection result
DocumentStructure
Document structure information
DocxHandler
DOCX document handler
ExtractedAudio
Extracted audio information
ExtractedContent
Extracted document content
ExtractedImage
Extracted image information
ExtractedLink
Extracted link information
ExtractedTable
Extracted table information
ExtractedVideo
Extracted video information
FallbackHandler
Fallback handler for unsupported formats
Heading
Heading information
HtmlHandler
HTML handler
ImageComplexityMetrics
Image complexity metrics
ImageFeatures
Image feature extraction results
ImageHandler
Image handler for various image formats (JPEG, PNG, GIF, WebP, etc.)
JsonHandler
JSON handler
MarkdownHandler
Markdown handler
MotionAnalysis
Motion analysis for video
MusicAnalysis
Music analysis results
ObjectMotion
Object motion tracking result
PdfHandler
PDF document handler
PitchStatistics
Pitch statistics for speech analysis
PlainTextHandler
Plain text handler
PptxHandler
PPTX document handler
ProcessingStats
Processing statistics
SpeechAnalysis
Speech analysis results
TocEntry
Table of contents entry
VideoAnalysis
Video analysis results
VideoHandler
Video handler for various video formats (MP4, AVI, MKV, WebM, etc.)
VideoKeyframe
Video keyframe information
VideoScene
Video scene detection result
XlsxHandler
XLSX document handler
XmlHandler
XML handler

Enums§

ChunkType
Content chunk types
ChunkingStrategy
Content chunking strategies
DocumentFormat
Document format types supported by the content processor
FusionStrategy
Fusion strategies for combining modalities
Modality
Modality types for cross-modal processing

Traits§

FormatHandler
Trait for format-specific content handlers
OfficeDocumentHandler
Base handler for Office documents (DOCX, PPTX, XLSX)

Type Aliases§

ColorTimelineEntry
Type alias for color timeline entries (timestamp, dominant_colors)