pub struct EndpointUpdateInput {Show 16 fields
pub allowed_cuda_versions: Option<Vec<CudaVersion>>,
pub cpu_flavor_ids: Option<Vec<CpuFlavorId>>,
pub data_center_ids: Option<Vec<DataCenterId>>,
pub execution_timeout_ms: Option<i32>,
pub flashboot: Option<bool>,
pub gpu_count: Option<i32>,
pub gpu_type_ids: Option<Vec<GpuTypeId>>,
pub idle_timeout: Option<i32>,
pub name: Option<String>,
pub network_volume_id: Option<String>,
pub scaler_type: Option<ScalerType>,
pub scaler_value: Option<i32>,
pub template_id: Option<String>,
pub vcpu_count: Option<i32>,
pub workers_max: Option<i32>,
pub workers_min: Option<i32>,
}Expand description
Input parameters for updating an existing serverless endpoint.
This struct allows you to modify endpoint configuration and trigger a rolling release that updates all workers with the new settings. All fields are optional, allowing you to update only the properties you want to change.
§Rolling Release Process
When an endpoint is updated:
- Validation: New configuration is validated for compatibility
- Version Increment: Endpoint version number is incremented
- Rolling Update: Workers are gradually replaced with new configuration
- Traffic Migration: Requests are routed to updated workers as they become available
- Cleanup: Old workers are terminated once traffic migration is complete
§Important Notes
- Zero Downtime: Updates are performed without service interruption
- Gradual Rollout: Workers are updated in batches to maintain availability
- Rollback: Previous versions can be restored if issues are detected
- Template Changes: Updating
template_iddeploys new container images
§Examples
use runpod_sdk::model::{EndpointUpdateInput, ScalerType};
// Scale up for increased traffic
let scale_up = EndpointUpdateInput {
workers_max: Some(20), // Double capacity
scaler_value: Some(2), // More aggressive scaling
idle_timeout: Some(10), // Keep workers longer
..Default::default()
};
// Enable flash boot for better performance
let performance_upgrade = EndpointUpdateInput {
flashboot: Some(true),
execution_timeout_ms: Some(60000), // Reduce timeout
..Default::default()
};
// Switch to cost-optimized scaling
let cost_optimization = EndpointUpdateInput {
scaler_type: Some(ScalerType::RequestCount),
scaler_value: Some(10), // 1 worker per 10 requests
workers_min: Some(0), // No reserved capacity
flashboot: Some(false), // Standard startup
..Default::default()
};Fields§
§allowed_cuda_versions: Option<Vec<CudaVersion>>If the endpoint is a GPU endpoint, acceptable CUDA versions for workers.
Updates the CUDA version constraints for worker allocation. Triggers rolling release to ensure all workers use compatible CUDA versions.
Note: Set to None to keep current setting unchanged.
cpu_flavor_ids: Option<Vec<CpuFlavorId>>If the endpoint is a CPU endpoint, list of CPU flavors for workers.
Updates the available CPU configurations for workers. The order determines rental priority for new workers.
CPU endpoints only: Ignored for GPU endpoints
Note: Set to None to keep current setting unchanged.
data_center_ids: Option<Vec<DataCenterId>>List of data center IDs where workers can be located.
Updates the geographic distribution of workers. Existing workers in removed data centers will be gradually replaced.
Note: Set to None to keep current setting unchanged.
execution_timeout_ms: Option<i32>Maximum execution time in milliseconds for individual requests.
Updates the timeout for request processing. Affects new requests immediately, existing requests continue with previous timeout.
Range: 1,000ms to 3,600,000ms (1 second to 1 hour)
Note: Set to None to keep current setting unchanged.
flashboot: Option<bool>Whether to enable flash boot for faster worker startup.
Updates the startup optimization for new workers. Affects cold start performance and per-request costs.
Trade-off: Higher per-request cost for faster startup
Note: Set to None to keep current setting unchanged.
gpu_count: Option<i32>If the endpoint is a GPU endpoint, number of GPUs per worker.
Updates GPU allocation for new workers. Triggers rolling release to deploy workers with the new GPU configuration.
GPU endpoints only: Ignored for CPU endpoints
Range: 1-8 depending on GPU type availability
Note: Set to None to keep current setting unchanged.
gpu_type_ids: Option<Vec<GpuTypeId>>If the endpoint is a GPU endpoint, list of GPU types for workers.
Updates available GPU hardware types for workers. The order determines rental priority for new workers.
GPU endpoints only: Ignored for CPU endpoints
Note: Set to None to keep current setting unchanged.
idle_timeout: Option<i32>Number of seconds workers can be idle before scaling down.
Updates the idle timeout for worker lifecycle management. Affects cost optimization and cold start frequency.
Range: 1-3600 seconds (1 second to 1 hour)
Note: Set to None to keep current setting unchanged.
name: Option<String>A user-defined name for the endpoint.
Updates the display name used in dashboards and API responses. This change is applied immediately without triggering a rolling release.
Max length: 191 characters
Note: Set to None to keep current name unchanged.
network_volume_id: Option<String>The unique ID of a network volume to attach to workers.
Updates the persistent storage attached to workers. Triggers rolling release to mount/unmount volumes on all workers.
Requirements: Volume must exist in same data centers as workers
Note: Set to None to keep current volume unchanged.
scaler_type: Option<ScalerType>The scaling strategy for managing worker count.
Updates the auto-scaling algorithm used for worker management. Change takes effect immediately for new scaling decisions.
Strategies:
QueueDelay: Scale based on request wait timeRequestCount: Scale based on queue depth
Note: Set to None to keep current strategy unchanged.
scaler_value: Option<i32>The scaling sensitivity parameter.
Updates the scaling behavior sensitivity. Change takes effect immediately for new scaling decisions.
For QueueDelay: Maximum seconds requests can wait
For RequestCount: Target requests per worker
Range: 1-3600
Note: Set to None to keep current value unchanged.
template_id: Option<String>The unique ID of the template used to create the endpoint.
Updates the container image and environment configuration. Triggers rolling release to deploy all workers with the new template.
Impact: Changes container image, environment, resource allocation
Rolling Release: All workers are gradually replaced
Note: Set to None to keep current template unchanged.
vcpu_count: Option<i32>If the endpoint is a CPU endpoint, number of vCPUs per worker.
Updates CPU allocation for new workers. Triggers rolling release to deploy workers with the new CPU configuration.
CPU endpoints only: Ignored for GPU endpoints
Range: 1-32 vCPUs depending on CPU flavor
Note: Set to None to keep current setting unchanged.
workers_max: Option<i32>Maximum number of workers that can run simultaneously.
Updates the scaling limit for worker count. Change takes effect immediately for new scaling decisions.
Range: 0-1000+ depending on account limits
Note: Set to None to keep current limit unchanged.
workers_min: Option<i32>Minimum number of workers that always remain running.
Updates the reserved capacity for immediate availability. Change triggers immediate scaling to meet the new minimum.
Range: 0-100 depending on account limits
Billing: Reserved workers are always charged (at reduced rate)
Note: Set to None to keep current minimum unchanged.
Trait Implementations§
Source§impl Clone for EndpointUpdateInput
impl Clone for EndpointUpdateInput
Source§fn clone(&self) -> EndpointUpdateInput
fn clone(&self) -> EndpointUpdateInput
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more