Struct PreprocessedImages

Source

pub struct PreprocessedImages {
    pub pixel_values: ArrayD<f32>,
    pub num_img_tokens: Vec<usize>,
    pub image_sizes: Vec<(u32, u32)>,
    pub model_specific: HashMap<String, ModelSpecificValue>,
}

Expand description

Preprocessed images ready for model consumption.

This struct contains all the outputs needed by the SGLang scheduler to construct MultimodalInputs for the model.

Fields§

§pixel_values: ArrayD<f32>

Pixel values as a dynamic-dimensional float32 tensor.

This is the primary input to the vision encoder. Shape varies by model:

Standard: [B, C, H, W] (4D)
Phi3-Vision: [B, num_crops+1, C, H, W] (5D)

§num_img_tokens: Vec<usize>

Number of image tokens per image in the batch.

Used to expand placeholder tokens in the text input. For example, LLaVA with 336x336 and patch_size=14 produces 576 tokens.

§image_sizes: Vec<(u32, u32)>

Original image sizes as (width, height) before preprocessing.

Some models need this for proper attention masking or position encoding.

§model_specific: HashMap<String, ModelSpecificValue>

Model-specific auxiliary outputs.

Examples:

Qwen-VL: image_grid_thw for rotary position encoding
LLaMA-Vision: aspect_ratio_ids, aspect_ratio_mask
Phi3-Vision: num_img_tokens per crop

Implementations§

Source §

impl PreprocessedImages

Source

pub fn new( pixel_values: Array4<f32>, num_img_tokens: Vec<usize>, image_sizes: Vec<(u32, u32)>, ) -> Self

Create a new PreprocessedImages with required fields (4D pixel values).

Source

pub fn new_dynamic( pixel_values: ArrayD<f32>, num_img_tokens: Vec<usize>, image_sizes: Vec<(u32, u32)>, ) -> Self

Create a new PreprocessedImages with dynamic-dimensional pixel values.

Use this for models like Phi3-Vision that have 5D tensors.

Source

pub fn with_extra( self, key: impl Into<String>, value: ModelSpecificValue, ) -> Self

Add a model-specific value.

Source

pub fn batch_size(&self) -> usize

Get the batch size.

Source

pub fn channels(&self) -> Result<usize, TransformError>

Get the number of channels.

For 4D tensors [B, C, H, W], returns shape[1]. For 5D tensors [B, N, C, H, W] (Phi3-Vision), returns shape[2].

§Errors

Returns TransformError::InvalidShape if pixel_values is not 4D or 5D.

Source

pub fn height(&self) -> Result<usize, TransformError>

Get the height of processed images.

For 4D tensors [B, C, H, W], returns shape[2]. For 5D tensors [B, N, C, H, W] (Phi3-Vision), returns shape[3].

§Errors

Returns TransformError::InvalidShape if pixel_values is not 4D or 5D.

Source

pub fn width(&self) -> Result<usize, TransformError>

Get the width of processed images.

For 4D tensors [B, C, H, W], returns shape[3]. For 5D tensors [B, N, C, H, W] (Phi3-Vision), returns shape[4].

§Errors

Returns TransformError::InvalidShape if pixel_values is not 4D or 5D.

Source

pub fn ndim(&self) -> usize

Get the number of dimensions of pixel_values.

Source

pub fn total_tokens(&self) -> usize

Get total number of image tokens across all images.

Source

pub fn pixel_values_flat(&self) -> Cow<'_, [f32]>

Get pixel values as a flat f32 slice without copying if possible.

Source

pub fn pixel_values_shape(&self) -> Vec<usize>

Get the shape of pixel values as a vector.

Source

pub fn num_images(&self) -> usize

Number of images in this batch.

Source

pub fn batched_keys(layouts: &HashMap<String, FieldLayout>) -> Vec<String>

Extract batched tensor keys from explicit field layout declarations.

Source

pub fn flat_keys( layouts: &HashMap<String, FieldLayout>, ) -> HashMap<String, String>

Extract flat-slicing tensor keys from explicit field layout declarations.

Returns a map of tensor name → sizes tensor name.

Trait Implementations§

Source §

impl Clone for PreprocessedImages

Source §

fn clone(&self) -> PreprocessedImages

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for PreprocessedImages

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl UnwindSafe for PreprocessedImages

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §