Endpoint

Struct Endpoint 

Source
pub struct Endpoint {
Show 22 fields pub id: String, pub name: Option<String>, pub user_id: String, pub template_id: String, pub version: i32, pub compute_type: ComputeType, pub created_at: String, pub data_center_ids: Vec<DataCenterId>, pub env: Option<EnvVars>, pub execution_timeout_ms: i32, pub gpu_count: Option<i32>, pub gpu_type_ids: Option<Vec<GpuTypeId>>, pub instance_ids: Option<Vec<String>>, pub idle_timeout: i32, pub network_volume_id: Option<String>, pub scaler_type: ScalerType, pub scaler_value: i32, pub workers_max: i32, pub workers_min: i32, pub allowed_cuda_versions: Option<Vec<CudaVersion>>, pub template: Option<Template>, pub workers: Option<Vec<Pod>>,
}
Expand description

Serverless endpoint resource providing auto-scaling compute infrastructure.

Represents a fully configured serverless endpoint with all deployment settings, scaling configuration, and runtime status. Endpoints automatically manage worker lifecycle based on request load and configured policies.

§Key Properties

  • Auto-scaling: Workers spin up/down based on request queue and scaling policy
  • Template-driven: Consistent runtime environment from pre-configured templates
  • Multi-region: Distributed deployment across multiple data centers
  • Cost-optimized: Pay-per-use billing with idle timeout management
  • High-availability: Automatic failover and redundancy

§Examples

use runpod_sdk::model::Endpoint;

// Endpoint instances are typically obtained from API responses
// when listing, creating, or retrieving serverless endpoints

Fields§

§id: String

A unique string identifying the serverless endpoint.

§name: Option<String>

A user-defined name for the endpoint. The name does not need to be unique.

Used for organization and identification in dashboards and monitoring. Can be updated without affecting endpoint functionality.

§user_id: String

A unique string identifying the RunPod user who created the endpoint.

§template_id: String

The unique string identifying the template used to create the endpoint.

Templates define the container image, environment, and resource configuration that will be deployed across all workers for this endpoint.

§version: i32

The current version of the endpoint configuration.

Incremented whenever the template or environment variables are changed, triggering a rolling update of all workers.

§compute_type: ComputeType

The type of compute used by workers on this endpoint.

Determines whether workers will have GPU or CPU compute resources attached. This setting affects pricing, available hardware types, and performance characteristics.

§created_at: String

The UTC timestamp when the endpoint was created.

ISO 8601 format string representing the endpoint creation time.

§data_center_ids: Vec<DataCenterId>

List of RunPod data center IDs where workers can be located.

Workers are distributed across these data centers for availability and performance. The system automatically selects the best available data center based on resource availability and proximity to users.

§env: Option<EnvVars>

Environment variables for the endpoint’s container runtime.

These variables are injected into all worker containers and can be used for configuration, API keys, feature flags, and other runtime settings.

§execution_timeout_ms: i32

The maximum execution time in milliseconds for individual requests.

If a request exceeds this timeout, the worker is stopped and the request is marked as failed. This prevents runaway processes and ensures predictable resource usage.

Common values:

  • Web APIs: 30,000ms (30 seconds)
  • AI inference: 300,000ms (5 minutes)
  • Batch processing: 3,600,000ms (1 hour)
§gpu_count: Option<i32>

The number of GPUs attached to each worker (GPU endpoints only).

Only relevant when compute_type is GPU. Determines the GPU resources allocated to each worker instance for parallel processing workloads.

§gpu_type_ids: Option<Vec<GpuTypeId>>

List of RunPod GPU types that can be attached to workers (GPU endpoints only).

The system tries to allocate GPUs in the order specified, falling back to subsequent types if the preferred options are unavailable. Only relevant when compute_type is GPU.

§instance_ids: Option<Vec<String>>

List of CPU instance IDs that can be attached to workers (CPU endpoints only).

For CPU endpoints, specifies the available instance types that workers can use, allowing the system to choose based on availability and cost.

§idle_timeout: i32

The number of seconds a worker can be idle before being scaled down.

Workers that haven’t processed requests for this duration are automatically terminated to reduce costs. Shorter timeouts reduce costs but may increase cold start latency for subsequent requests.

Typical values:

  • Cost-optimized: 30-60 seconds
  • Balanced: 5-15 seconds
  • Performance-optimized: 1-5 seconds
§network_volume_id: Option<String>

The unique ID of the network volume attached to workers, if any.

Network volumes provide persistent, shared storage across all workers, useful for model weights, datasets, and other shared assets.

§scaler_type: ScalerType

The scaling strategy used to manage worker count.

Determines how the system responds to request load by scaling workers up or down automatically.

§scaler_value: i32

The scaling sensitivity parameter.

For QueueDelay scaling:

  • Seconds a request can wait in queue before scaling up
  • Lower values = more responsive but potentially higher costs

For RequestCount scaling:

  • Target requests per worker (queue_size / scaler_value = worker_count)
  • Higher values = fewer workers, more cost-efficient
§workers_max: i32

The maximum number of workers that can run simultaneously.

Hard limit preventing runaway scaling and controlling maximum costs. Set based on expected peak load and budget constraints.

§workers_min: i32

The minimum number of workers that always remain running.

Reserved capacity that’s always available, even during idle periods. These workers are billed at a lower rate but provide immediate availability. Set to 0 for maximum cost efficiency, or >0 for better responsiveness.

§allowed_cuda_versions: Option<Vec<CudaVersion>>

List of acceptable CUDA versions for GPU workers.

If specified, only workers with compatible CUDA runtimes will be used. Useful for ensuring compatibility with specific AI/ML frameworks. Only relevant for GPU endpoints.

§template: Option<Template>

Detailed template information (included when include_template is true).

Contains the full template configuration including container image, environment setup, and resource requirements.

§workers: Option<Vec<Pod>>

Current worker instances (included when include_workers is true).

List of active worker pods with their current status, resource allocation, and performance metrics.

Implementations§

Source§

impl Endpoint

Source

pub fn to_runner(&self, client: RunpodClient) -> ServerlessEndpoint

Available on crate feature serverless only.

Creates an endpoint runner from this endpoint.

§Example
let client = RunpodClient::from_env()?;
let serverless_endpoint = client.get_endpoint("endpoint_id", GetEndpointQuery::default()).await?;

let runner = serverless_endpoint.to_runner(client);
Source

pub fn run<I>(&self, client: RunpodClient, input: &I) -> Result<ServerlessJob>
where I: Serialize,

Available on crate feature serverless only.

Runs a job on this endpoint.

This is a convenience method that creates a runner and submits a job in one call.

§Example
let client = RunpodClient::from_env()?;
let serverless_endpoint = client.get_endpoint("endpoint_id", GetEndpointQuery::default()).await?;

let job = serverless_endpoint.run(client, &json!({"prompt": "Hello"}))?;

Trait Implementations§

Source§

impl Clone for Endpoint

Source§

fn clone(&self) -> Endpoint

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Endpoint

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for Endpoint

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for Endpoint

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,