Module modality

Module modality 

Source
Expand description

Model Modality Support

Defines modalities for different types of AI models:

  • LLM (Language Models) - Text-only
  • VLM (Vision-Language Models) - Text + Images
  • VLA (Vision-Language-Action Models) - Text + Images + Actions/Robotics
  • ALM (Audio-Language Models) - Text + Audio
  • VALM (Video-Audio-Language Models) - Text + Video + Audio

Structs§

ActionCommand
Action command for VLA models
AudioContent
Audio content for audio models
BoundingBoxRegion
Bounding box region in an image
ImageContent
Image content for vision models
ModalityCapabilities
Describes what modalities a model can accept and produce
ModelPricing
Model pricing information
MultimodalMessage
A multimodal message that can contain mixed content types
MultimodalModel
Known VLM/VLA model with capabilities
SensorData
Sensor data input for VLA models
VideoContent
Video content for video models
Waypoint
Waypoint in a trajectory

Enums§

ActionParameters
Parameters for different action types
ActionType
Types of robot actions
AudioData
Audio data
ContentPart
Content part of a multimodal message
ImageData
Image data - either base64 encoded or URL reference
ImageDetail
Image detail level for vision models
ImageFormat
Supported image formats
Modality
Input/Output modality types
ModelCategory
Model category based on modality support
SensorType
Types of sensors
SensorValues
Sensor values for different sensor types
VideoData
Video data - URL or uploaded frames

Functions§

vla_models
Get built-in VLA models
vlm_models
Get built-in VLM models