pub struct Endpoint {Show 22 fields
pub id: String,
pub name: Option<String>,
pub user_id: String,
pub template_id: String,
pub version: i32,
pub compute_type: ComputeType,
pub created_at: String,
pub data_center_ids: Vec<DataCenterId>,
pub env: Option<EnvVars>,
pub execution_timeout_ms: i32,
pub gpu_count: Option<i32>,
pub gpu_type_ids: Option<Vec<GpuTypeId>>,
pub instance_ids: Option<Vec<String>>,
pub idle_timeout: i32,
pub network_volume_id: Option<String>,
pub scaler_type: ScalerType,
pub scaler_value: i32,
pub workers_max: i32,
pub workers_min: i32,
pub allowed_cuda_versions: Option<Vec<CudaVersion>>,
pub template: Option<Template>,
pub workers: Option<Vec<Pod>>,
}Expand description
Serverless endpoint resource providing auto-scaling compute infrastructure.
Represents a fully configured serverless endpoint with all deployment settings, scaling configuration, and runtime status. Endpoints automatically manage worker lifecycle based on request load and configured policies.
§Key Properties
- Auto-scaling: Workers spin up/down based on request queue and scaling policy
- Template-driven: Consistent runtime environment from pre-configured templates
- Multi-region: Distributed deployment across multiple data centers
- Cost-optimized: Pay-per-use billing with idle timeout management
- High-availability: Automatic failover and redundancy
§Examples
use runpod_sdk::model::Endpoint;
// Endpoint instances are typically obtained from API responses
// when listing, creating, or retrieving serverless endpointsFields§
§id: StringA unique string identifying the serverless endpoint.
name: Option<String>A user-defined name for the endpoint. The name does not need to be unique.
Used for organization and identification in dashboards and monitoring. Can be updated without affecting endpoint functionality.
user_id: StringA unique string identifying the RunPod user who created the endpoint.
template_id: StringThe unique string identifying the template used to create the endpoint.
Templates define the container image, environment, and resource configuration that will be deployed across all workers for this endpoint.
version: i32The current version of the endpoint configuration.
Incremented whenever the template or environment variables are changed, triggering a rolling update of all workers.
compute_type: ComputeTypeThe type of compute used by workers on this endpoint.
Determines whether workers will have GPU or CPU compute resources attached. This setting affects pricing, available hardware types, and performance characteristics.
created_at: StringThe UTC timestamp when the endpoint was created.
ISO 8601 format string representing the endpoint creation time.
data_center_ids: Vec<DataCenterId>List of RunPod data center IDs where workers can be located.
Workers are distributed across these data centers for availability and performance. The system automatically selects the best available data center based on resource availability and proximity to users.
env: Option<EnvVars>Environment variables for the endpoint’s container runtime.
These variables are injected into all worker containers and can be used for configuration, API keys, feature flags, and other runtime settings.
execution_timeout_ms: i32The maximum execution time in milliseconds for individual requests.
If a request exceeds this timeout, the worker is stopped and the request is marked as failed. This prevents runaway processes and ensures predictable resource usage.
Common values:
- Web APIs: 30,000ms (30 seconds)
- AI inference: 300,000ms (5 minutes)
- Batch processing: 3,600,000ms (1 hour)
gpu_count: Option<i32>The number of GPUs attached to each worker (GPU endpoints only).
Only relevant when compute_type is GPU. Determines the GPU resources
allocated to each worker instance for parallel processing workloads.
gpu_type_ids: Option<Vec<GpuTypeId>>List of RunPod GPU types that can be attached to workers (GPU endpoints only).
The system tries to allocate GPUs in the order specified, falling back
to subsequent types if the preferred options are unavailable.
Only relevant when compute_type is GPU.
instance_ids: Option<Vec<String>>List of CPU instance IDs that can be attached to workers (CPU endpoints only).
For CPU endpoints, specifies the available instance types that workers can use, allowing the system to choose based on availability and cost.
idle_timeout: i32The number of seconds a worker can be idle before being scaled down.
Workers that haven’t processed requests for this duration are automatically terminated to reduce costs. Shorter timeouts reduce costs but may increase cold start latency for subsequent requests.
Typical values:
- Cost-optimized: 30-60 seconds
- Balanced: 5-15 seconds
- Performance-optimized: 1-5 seconds
network_volume_id: Option<String>The unique ID of the network volume attached to workers, if any.
Network volumes provide persistent, shared storage across all workers, useful for model weights, datasets, and other shared assets.
scaler_type: ScalerTypeThe scaling strategy used to manage worker count.
Determines how the system responds to request load by scaling workers up or down automatically.
scaler_value: i32The scaling sensitivity parameter.
For QueueDelay scaling:
- Seconds a request can wait in queue before scaling up
- Lower values = more responsive but potentially higher costs
For RequestCount scaling:
- Target requests per worker (queue_size / scaler_value = worker_count)
- Higher values = fewer workers, more cost-efficient
workers_max: i32The maximum number of workers that can run simultaneously.
Hard limit preventing runaway scaling and controlling maximum costs. Set based on expected peak load and budget constraints.
workers_min: i32The minimum number of workers that always remain running.
Reserved capacity that’s always available, even during idle periods. These workers are billed at a lower rate but provide immediate availability. Set to 0 for maximum cost efficiency, or >0 for better responsiveness.
allowed_cuda_versions: Option<Vec<CudaVersion>>List of acceptable CUDA versions for GPU workers.
If specified, only workers with compatible CUDA runtimes will be used. Useful for ensuring compatibility with specific AI/ML frameworks. Only relevant for GPU endpoints.
template: Option<Template>Detailed template information (included when include_template is true).
Contains the full template configuration including container image, environment setup, and resource requirements.
workers: Option<Vec<Pod>>Current worker instances (included when include_workers is true).
List of active worker pods with their current status, resource allocation, and performance metrics.
Implementations§
Source§impl Endpoint
impl Endpoint
Sourcepub fn to_runner(&self, client: RunpodClient) -> ServerlessEndpoint
Available on crate feature serverless only.
pub fn to_runner(&self, client: RunpodClient) -> ServerlessEndpoint
serverless only.Creates an endpoint runner from this endpoint.
§Example
let client = RunpodClient::from_env()?;
let serverless_endpoint = client.get_endpoint("endpoint_id", GetEndpointQuery::default()).await?;
let runner = serverless_endpoint.to_runner(client);Sourcepub fn run<I>(&self, client: RunpodClient, input: &I) -> Result<ServerlessJob>where
I: Serialize,
Available on crate feature serverless only.
pub fn run<I>(&self, client: RunpodClient, input: &I) -> Result<ServerlessJob>where
I: Serialize,
serverless only.Runs a job on this endpoint.
This is a convenience method that creates a runner and submits a job in one call.
§Example
let client = RunpodClient::from_env()?;
let serverless_endpoint = client.get_endpoint("endpoint_id", GetEndpointQuery::default()).await?;
let job = serverless_endpoint.run(client, &json!({"prompt": "Hello"}))?;