Expand description
Preprocessing module for filtering noise from conversation exports before embedding.
Problem: ~36-40% of conversation exports are NOISE (tool outputs, metadata, CLI commands). This module filters noise BEFORE embedding to save vector space and improve search quality.
Structs§
- Message
- Message structure for conversation filtering
- Preprocessing
Config - Configuration for the preprocessing pipeline.
- Preprocessing
Stats - Statistics from preprocessing
- Preprocessor
- The main preprocessor for filtering conversation noise
- Text
Integrity Metrics - Text integrity metrics for embedding quality assessment.
Enums§
- Integrity
Recommendation - Recommendation based on integrity metrics