Expand description
Model Modality Support
Defines modalities for different types of AI models:
- LLM (Language Models) - Text-only
- VLM (Vision-Language Models) - Text + Images
- VLA (Vision-Language-Action Models) - Text + Images + Actions/Robotics
- ALM (Audio-Language Models) - Text + Audio
- VALM (Video-Audio-Language Models) - Text + Video + Audio
Structs§
- Action
Command - Action command for VLA models
- Audio
Content - Audio content for audio models
- Bounding
BoxRegion - Bounding box region in an image
- Image
Content - Image content for vision models
- Modality
Capabilities - Describes what modalities a model can accept and produce
- Model
Pricing - Model pricing information
- Multimodal
Message - A multimodal message that can contain mixed content types
- Multimodal
Model - Known VLM/VLA model with capabilities
- Sensor
Data - Sensor data input for VLA models
- Video
Content - Video content for video models
- Waypoint
- Waypoint in a trajectory
Enums§
- Action
Parameters - Parameters for different action types
- Action
Type - Types of robot actions
- Audio
Data - Audio data
- Content
Part - Content part of a multimodal message
- Image
Data - Image data - either base64 encoded or URL reference
- Image
Detail - Image detail level for vision models
- Image
Format - Supported image formats
- Modality
- Input/Output modality types
- Model
Category - Model category based on modality support
- Sensor
Type - Types of sensors
- Sensor
Values - Sensor values for different sensor types
- Video
Data - Video data - URL or uploaded frames
Functions§
- vla_
models - Get built-in VLA models
- vlm_
models - Get built-in VLM models