Skip to main content

Module preprocessing

Module preprocessing 

Source
Expand description

Preprocessing module for filtering noise from conversation exports before embedding.

Problem: ~36-40% of conversation exports are NOISE (tool outputs, metadata, CLI commands). This module filters noise BEFORE embedding to save vector space and improve search quality.

Structs§

Message
Message structure for conversation filtering
PreprocessingConfig
Configuration for the preprocessing pipeline.
PreprocessingStats
Statistics from preprocessing
Preprocessor
The main preprocessor for filtering conversation noise
TextIntegrityMetrics
Text integrity metrics for embedding quality assessment.

Enums§

IntegrityRecommendation
Recommendation based on integrity metrics