Expand description
VL-model image preprocessor.
When the active main provider does not accept images and the user submits an image, this module routes the image (plus the current-turn caption only) through a configurable vision-language provider, returning a textual description that callers splice into the user message before forwarding to the main provider as plain text.
Key invariant: the VL call NEVER sees the main conversation history. The
Vec<Message> passed to the VL provider is constructed locally from
caption + images and contains exactly one user turn.
Enums§
- Preprocess
Outcome - Outcome of a preprocessing attempt.
Functions§
- maybe_
preprocess - Decide whether and how to preprocess images before a main-provider turn.