pub struct PreprocessedImages {
pub pixel_values: ArrayD<f32>,
pub num_img_tokens: Vec<usize>,
pub image_sizes: Vec<(u32, u32)>,
pub model_specific: HashMap<String, ModelSpecificValue>,
}Expand description
Preprocessed images ready for model consumption.
This struct contains all the outputs needed by the SGLang scheduler
to construct MultimodalInputs for the model.
Fields§
§pixel_values: ArrayD<f32>Pixel values as a dynamic-dimensional float32 tensor.
This is the primary input to the vision encoder. Shape varies by model:
- Standard: [B, C, H, W] (4D)
- Phi3-Vision: [B, num_crops+1, C, H, W] (5D)
num_img_tokens: Vec<usize>Number of image tokens per image in the batch.
Used to expand placeholder tokens in the text input. For example, LLaVA with 336x336 and patch_size=14 produces 576 tokens.
image_sizes: Vec<(u32, u32)>Original image sizes as (width, height) before preprocessing.
Some models need this for proper attention masking or position encoding.
model_specific: HashMap<String, ModelSpecificValue>Model-specific auxiliary outputs.
Examples:
- Qwen-VL:
image_grid_thwfor rotary position encoding - LLaMA-Vision:
aspect_ratio_ids,aspect_ratio_mask - Phi3-Vision:
num_img_tokensper crop
Implementations§
Source§impl PreprocessedImages
impl PreprocessedImages
Sourcepub fn new(
pixel_values: Array4<f32>,
num_img_tokens: Vec<usize>,
image_sizes: Vec<(u32, u32)>,
) -> Self
pub fn new( pixel_values: Array4<f32>, num_img_tokens: Vec<usize>, image_sizes: Vec<(u32, u32)>, ) -> Self
Create a new PreprocessedImages with required fields (4D pixel values).
Sourcepub fn new_dynamic(
pixel_values: ArrayD<f32>,
num_img_tokens: Vec<usize>,
image_sizes: Vec<(u32, u32)>,
) -> Self
pub fn new_dynamic( pixel_values: ArrayD<f32>, num_img_tokens: Vec<usize>, image_sizes: Vec<(u32, u32)>, ) -> Self
Create a new PreprocessedImages with dynamic-dimensional pixel values.
Use this for models like Phi3-Vision that have 5D tensors.
Sourcepub fn with_extra(
self,
key: impl Into<String>,
value: ModelSpecificValue,
) -> Self
pub fn with_extra( self, key: impl Into<String>, value: ModelSpecificValue, ) -> Self
Add a model-specific value.
Sourcepub fn batch_size(&self) -> usize
pub fn batch_size(&self) -> usize
Get the batch size.
Sourcepub fn channels(&self) -> Result<usize, TransformError>
pub fn channels(&self) -> Result<usize, TransformError>
Get the number of channels.
For 4D tensors [B, C, H, W], returns shape[1]. For 5D tensors [B, N, C, H, W] (Phi3-Vision), returns shape[2].
§Errors
Returns TransformError::InvalidShape if pixel_values is not 4D or 5D.
Sourcepub fn height(&self) -> Result<usize, TransformError>
pub fn height(&self) -> Result<usize, TransformError>
Get the height of processed images.
For 4D tensors [B, C, H, W], returns shape[2]. For 5D tensors [B, N, C, H, W] (Phi3-Vision), returns shape[3].
§Errors
Returns TransformError::InvalidShape if pixel_values is not 4D or 5D.
Sourcepub fn width(&self) -> Result<usize, TransformError>
pub fn width(&self) -> Result<usize, TransformError>
Get the width of processed images.
For 4D tensors [B, C, H, W], returns shape[3]. For 5D tensors [B, N, C, H, W] (Phi3-Vision), returns shape[4].
§Errors
Returns TransformError::InvalidShape if pixel_values is not 4D or 5D.
Sourcepub fn total_tokens(&self) -> usize
pub fn total_tokens(&self) -> usize
Get total number of image tokens across all images.
Sourcepub fn pixel_values_flat(&self) -> Cow<'_, [f32]>
pub fn pixel_values_flat(&self) -> Cow<'_, [f32]>
Get pixel values as a flat f32 slice without copying if possible.
Sourcepub fn pixel_values_shape(&self) -> Vec<usize>
pub fn pixel_values_shape(&self) -> Vec<usize>
Get the shape of pixel values as a vector.
Sourcepub fn num_images(&self) -> usize
pub fn num_images(&self) -> usize
Number of images in this batch.
Sourcepub fn batched_keys(layouts: &HashMap<String, FieldLayout>) -> Vec<String>
pub fn batched_keys(layouts: &HashMap<String, FieldLayout>) -> Vec<String>
Extract batched tensor keys from explicit field layout declarations.
Trait Implementations§
Source§impl Clone for PreprocessedImages
impl Clone for PreprocessedImages
Source§fn clone(&self) -> PreprocessedImages
fn clone(&self) -> PreprocessedImages
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for PreprocessedImages
impl RefUnwindSafe for PreprocessedImages
impl Send for PreprocessedImages
impl Sync for PreprocessedImages
impl Unpin for PreprocessedImages
impl UnsafeUnpin for PreprocessedImages
impl UnwindSafe for PreprocessedImages
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more