Struct Endpoint

Source

pub struct Endpoint {Show 22 fields
    pub id: String,
    pub name: Option<String>,
    pub user_id: String,
    pub template_id: String,
    pub version: i32,
    pub compute_type: ComputeType,
    pub created_at: String,
    pub data_center_ids: Vec<DataCenterId>,
    pub env: Option<EnvVars>,
    pub execution_timeout_ms: i32,
    pub gpu_count: Option<i32>,
    pub gpu_type_ids: Option<Vec<GpuTypeId>>,
    pub instance_ids: Option<Vec<String>>,
    pub idle_timeout: i32,
    pub network_volume_id: Option<String>,
    pub scaler_type: ScalerType,
    pub scaler_value: i32,
    pub workers_max: i32,
    pub workers_min: i32,
    pub allowed_cuda_versions: Option<Vec<CudaVersion>>,
    pub template: Option<Template>,
    pub workers: Option<Vec<Pod>>,
}

Expand description

Serverless endpoint resource providing auto-scaling compute infrastructure.

Represents a fully configured serverless endpoint with all deployment settings, scaling configuration, and runtime status. Endpoints automatically manage worker lifecycle based on request load and configured policies.

§Key Properties

Auto-scaling: Workers spin up/down based on request queue and scaling policy
Template-driven: Consistent runtime environment from pre-configured templates
Multi-region: Distributed deployment across multiple data centers
Cost-optimized: Pay-per-use billing with idle timeout management
High-availability: Automatic failover and redundancy

§Examples

use runpod_sdk::model::Endpoint;

// Endpoint instances are typically obtained from API responses
// when listing, creating, or retrieving serverless endpoints

Fields§

§id: String

A unique string identifying the serverless endpoint.

§name: Option<String>

A user-defined name for the endpoint. The name does not need to be unique.

Used for organization and identification in dashboards and monitoring. Can be updated without affecting endpoint functionality.

§user_id: String

A unique string identifying the RunPod user who created the endpoint.

§template_id: String

The unique string identifying the template used to create the endpoint.

Templates define the container image, environment, and resource configuration that will be deployed across all workers for this endpoint.

§version: i32

The current version of the endpoint configuration.

Incremented whenever the template or environment variables are changed, triggering a rolling update of all workers.

§compute_type: ComputeType

The type of compute used by workers on this endpoint.

Determines whether workers will have GPU or CPU compute resources attached. This setting affects pricing, available hardware types, and performance characteristics.

§created_at: String

The UTC timestamp when the endpoint was created.

ISO 8601 format string representing the endpoint creation time.

§data_center_ids: Vec<DataCenterId>

List of RunPod data center IDs where workers can be located.

Workers are distributed across these data centers for availability and performance. The system automatically selects the best available data center based on resource availability and proximity to users.

§env: Option<EnvVars>

Environment variables for the endpoint’s container runtime.

These variables are injected into all worker containers and can be used for configuration, API keys, feature flags, and other runtime settings.

§execution_timeout_ms: i32

The maximum execution time in milliseconds for individual requests.

If a request exceeds this timeout, the worker is stopped and the request is marked as failed. This prevents runaway processes and ensures predictable resource usage.

Common values:

Web APIs: 30,000ms (30 seconds)
AI inference: 300,000ms (5 minutes)
Batch processing: 3,600,000ms (1 hour)

§gpu_count: Option<i32>

The number of GPUs attached to each worker (GPU endpoints only).

Only relevant when compute_type is GPU. Determines the GPU resources allocated to each worker instance for parallel processing workloads.

§gpu_type_ids: Option<Vec<GpuTypeId>>

List of RunPod GPU types that can be attached to workers (GPU endpoints only).

The system tries to allocate GPUs in the order specified, falling back to subsequent types if the preferred options are unavailable. Only relevant when compute_type is GPU.

§instance_ids: Option<Vec<String>>

List of CPU instance IDs that can be attached to workers (CPU endpoints only).

For CPU endpoints, specifies the available instance types that workers can use, allowing the system to choose based on availability and cost.

§idle_timeout: i32

The number of seconds a worker can be idle before being scaled down.

Workers that haven’t processed requests for this duration are automatically terminated to reduce costs. Shorter timeouts reduce costs but may increase cold start latency for subsequent requests.

Typical values:

Cost-optimized: 30-60 seconds
Balanced: 5-15 seconds
Performance-optimized: 1-5 seconds

§network_volume_id: Option<String>

The unique ID of the network volume attached to workers, if any.

Network volumes provide persistent, shared storage across all workers, useful for model weights, datasets, and other shared assets.

§scaler_type: ScalerType

The scaling strategy used to manage worker count.

Determines how the system responds to request load by scaling workers up or down automatically.

§scaler_value: i32

The scaling sensitivity parameter.

For QueueDelay scaling:

Seconds a request can wait in queue before scaling up
Lower values = more responsive but potentially higher costs

For RequestCount scaling:

Target requests per worker (queue_size / scaler_value = worker_count)
Higher values = fewer workers, more cost-efficient

§workers_max: i32

The maximum number of workers that can run simultaneously.

Hard limit preventing runaway scaling and controlling maximum costs. Set based on expected peak load and budget constraints.

§workers_min: i32

The minimum number of workers that always remain running.

Reserved capacity that’s always available, even during idle periods. These workers are billed at a lower rate but provide immediate availability. Set to 0 for maximum cost efficiency, or >0 for better responsiveness.

§allowed_cuda_versions: Option<Vec<CudaVersion>>

List of acceptable CUDA versions for GPU workers.

If specified, only workers with compatible CUDA runtimes will be used. Useful for ensuring compatibility with specific AI/ML frameworks. Only relevant for GPU endpoints.

§template: Option<Template>

Detailed template information (included when include_template is true).

Contains the full template configuration including container image, environment setup, and resource requirements.

§workers: Option<Vec<Pod>>

Current worker instances (included when include_workers is true).

List of active worker pods with their current status, resource allocation, and performance metrics.

Struct Endpoint Copy item path

§Key Properties

§Examples

Fields§

Implementations§

impl Endpoint

pub fn to_runner(&self, client: RunpodClient) -> ServerlessEndpoint

§Example

pub fn run<I>(&self, client: RunpodClient, input: &I) -> Result<ServerlessJob>where I: Serialize,

§Example

Trait Implementations§

impl Clone for Endpoint

fn clone(&self) -> Endpoint

fn clone_from(&mut self, source: &Self)

impl Debug for Endpoint

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl<'de> Deserialize<'de> for Endpoint

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,

impl Serialize for Endpoint

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

Auto Trait Implementations§

impl Freeze for Endpoint

impl RefUnwindSafe for Endpoint

impl Send for Endpoint

impl Sync for Endpoint

impl Unpin for Endpoint

impl UnwindSafe for Endpoint

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

impl<T> DeserializeOwned for Twhere T: for<'de> Deserialize<'de>,

Struct Endpoint

pub fn run<I>(&self, client: RunpodClient, input: &I) -> Result<ServerlessJob>
where I: Serialize,

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,