Module vision_preprocessor

Expand description

VL-model image preprocessor.

When the active main provider does not accept images and the user submits an image, this module routes the image (plus the current-turn caption only) through a configurable vision-language provider, returning a textual description that callers splice into the user message before forwarding to the main provider as plain text.

Key invariant: the VL call NEVER sees the main conversation history. The Vec<Message> passed to the VL provider is constructed locally from caption + images and contains exactly one user turn.

Enums§

PreprocessOutcome: Outcome of a preprocessing attempt.

Functions§

maybe_preprocess: Decide whether and how to preprocess images before a main-provider turn.