Crate triton_sys

source ·

Re-exports§

Structs§

Constants§

Functions§

  • Get the TRITONBACKEND API version supported by Triton. This value can be compared against the TRITONBACKEND_API_VERSION_MAJOR and TRITONBACKEND_API_VERSION_MINOR used to build the backend to ensure that Triton is compatible with the backend.
  • Get the location of the files that make up the backend implementation. This location contains the backend shared library and any other files located with the shared library. The ‘location’ communicated depends on how the backend is being communicated to Triton as indicated by ‘artifact_type’.
  • Add the preferred instance group of the backend. This function can be called multiple times to cover different instance group kinds that the backend supports, given the priority order that the first call describes the most preferred group. In the case where instance group are not explicitly provided, Triton will use this attribute to create model deployment that aligns more with the backend preference.
  • Sets whether or not the backend supports concurrently loading multiple TRITONBACKEND_ModelInstances in a thread-safe manner.
  • Get the backend configuration. The ‘backend_config’ message is owned by Triton and should not be modified or freed by the caller.
  • Get the execution policy for this backend. By default the execution policy is TRITONBACKEND_EXECUTION_BLOCKING.
  • Get the memory manager associated with a backend.
  • Get the name of the backend. The caller does not own the returned string and must not modify or delete it. The lifetime of the returned string extends only as long as ‘backend’.
  • Set the execution policy for this backend. By default the execution policy is TRITONBACKEND_EXECUTION_BLOCKING. Triton reads the backend’s execution policy after calling TRITONBACKEND_Initialize, so to be recognized changes to the execution policy must be made in TRITONBACKEND_Initialize. Also, note that if using sequence batcher for the model, Triton will use TRITONBACKEND_EXECUTION_BLOCKING policy irrespective of the policy specified by this setter function.
  • Set the user-specified state associated with the backend. The state is completely owned and managed by the backend.
  • Get the user-specified state associated with the backend. The state is completely owned and managed by the backend.
  • Finalize for a backend. This function is optional, a backend is not required to implement it. This function is called once, just before the backend is unloaded. All state associated with the backend should be freed and any threads created for the backend should be exited/joined before returning from this function.
  • Query the backend for different model attributes. This function is optional, a backend is not required to implement it. The backend is also not required to set all backend attribute listed. This function is called when Triton requires further backend / model information to perform operations. This function may be called multiple times within the lifetime of the backend (between TRITONBACKEND_Initialize and TRITONBACKEND_Finalize). The backend may return error to indicate failure to set the backend attributes, and the attributes specified in the same function call will be ignored. Triton will update the specified attributes if ‘nullptr’ is returned.
  • Get all information about an output tensor by its index. The caller does not own any of the referenced return values and must not modify or delete them. The lifetime of all returned values extends until ‘response’ is deleted.
  • Get all information about an output tensor by its name. The caller does not own any of the referenced return values and must not modify or delete them. The lifetime of all returned values extends until ‘response’ is deleted.
  • Initialize a backend. This function is optional, a backend is not required to implement it. This function is called once when a backend is loaded to allow the backend to initialize any state associated with the backend. A backend has a single state that is shared across all models that use the backend.
  • Get a buffer holding (part of) the tensor data for an input. For a given input the number of buffers composing the input are found from ‘buffer_count’ returned by TRITONBACKEND_InputProperties. The returned buffer is owned by the input and so should not be modified or freed by the caller. The lifetime of the buffer matches that of the input and so the buffer should not be accessed after the input tensor object is released.
  • Get the buffer attributes associated with the given input buffer. For a given input the number of buffers composing the input are found from ‘buffer_count’ returned by TRITONBACKEND_InputProperties. The returned ‘buffer_attributes’ is owned by the input and so should not be modified or freed by the caller. The lifetime of the ‘buffer_attributes’ matches that of the input and so the ‘buffer_attributes’ should not be accessed after the input tensor object is released.
  • Get a buffer holding (part of) the tensor data for an input for a specific host policy. If there are no input buffers specified for this host policy, the fallback input buffer is returned. For a given input the number of buffers composing the input are found from ‘buffer_count’ returned by TRITONBACKEND_InputPropertiesForHostPolicy. The returned buffer is owned by the input and so should not be modified or freed by the caller. The lifetime of the buffer matches that of the input and so the buffer should not be accessed after the input tensor object is released.
  • Get the name and properties of an input tensor. The returned strings and other properties are owned by the input, not the caller, and so should not be modified or freed.
  • Get the name and properties of an input tensor associated with a given host policy. If there are no input buffers for the specified host policy, the properties of the fallback input buffers are returned. The returned strings and other properties are owned by the input, not the caller, and so should not be modified or freed.
  • Allocate a contiguous block of memory of a specific type using a memory manager. Two error codes have specific interpretations for this function:
  • Free a buffer that was previously allocated with TRITONBACKEND_MemoryManagerAllocate. The call must provide the same values for ‘memory_type’ and ‘memory_type_id’ as were used when the buffer was allocate or else the behavior is undefined.
  • Whether the backend should attempt to auto-complete the model configuration. If true, the model should fill the inputs, outputs, and max batch size in the model configuration if incomplete. If the model configuration is changed, the new configuration must be reported to Triton using TRITONBACKEND_ModelSetConfig.
  • Get the backend used by the model.
  • Callback to be invoked when Triton has finishing forming a batch.
  • Check whether a request should be added to the pending model batch.
  • \param userp The placeholder for backend to store and retrieve information about this pending batch. \return a TRITONSERVER_Error indicating success or failure.
  • Free memory associated with batcher. This is called during model unloading.
  • Create a new batcher for use with custom batching. This is called during model loading. The batcher will point to a user-defined data structure that holds read-only data used for custom batching.
  • Get the model configuration. The caller takes ownership of the message object and must call TRITONSERVER_MessageDelete to release the object. The configuration is available via this call even before the model is loaded and so can be used in TRITONBACKEND_ModelInitialize. TRITONSERVER_ServerModelConfig returns equivalent information but is not usable until after the model loads.
  • Finalize for a model. This function is optional, a backend is not required to implement it. This function is called once for a model, just before the model is unloaded from Triton. All state associated with the model should be freed and any threads created for the model should be exited/joined before returning from this function.
  • Initialize for a model. This function is optional, a backend is not required to implement it. This function is called once when a model that uses the backend is loaded to allow the backend to initialize any state associated with the model. The backend should also examine the model configuration to determine if the configuration is suitable for the backend. Any errors reported by this function will prevent the model from loading.
  • Get the device ID of the model instance.
  • Execute a batch of one or more requests on a model instance. This function is required. Triton will not perform multiple simultaneous calls to this function for a given model ‘instance’; however, there may be simultaneous calls for different model instances (for the same or different models).
  • Finalize for a model instance. This function is optional, a backend is not required to implement it. This function is called once for an instance, just before the corresponding model is unloaded from Triton. All state associated with the instance should be freed and any threads created for the instance should be exited/joined before returning from this function.
  • Get the host policy setting. The ‘host_policy’ message is owned by Triton and should not be modified or freed by the caller.
  • Initialize for a model instance. This function is optional, a backend is not required to implement it. This function is called once when a model instance is created to allow the backend to initialize any state associated with the instance.
  • Whether the model instance is passive.
  • Get the kind of the model instance.
  • Get the model associated with a model instance.
  • Get the name of the model instance. The returned string is owned by the model object, not the caller, and so should not be modified or freed.
  • Get the number of optimization profiles to be loaded for the instance.
  • Get the name of optimization profile. The caller does not own the returned string and must not modify or delete it. The lifetime of the returned string extends only as long as ‘instance’.
  • Record statistics for the execution of an entire batch of inference requests.
  • Report the memory usage of the model instance that will be released on TRITONBACKEND_ModelInstanceFinalize. The backend may call this function within the lifecycle of the TRITONBACKEND_Model object (between TRITONBACKEND_ModelInstanceInitialize and TRITONBACKEND_ModelInstanceFinalize) to report the latest usage. To report the memory usage of the model, see TRITONBACKEND_ModelReportMemoryUsage.
  • Record statistics for an inference request.
  • Get the number of secondary devices configured for the instance.
  • Get the properties of indexed secondary device. The returned strings and other properties are owned by the instance, not the caller, and so should not be modified or freed.
  • Set the user-specified state associated with the model instance. The state is completely owned and managed by the backend.
  • Get the user-specified state associated with the model instance. The state is completely owned and managed by the backend.
  • Get the name of the model. The returned string is owned by the model object, not the caller, and so should not be modified or freed.
  • Report the memory usage of the model that will be released on TRITONBACKEND_ModelFinalize. The backend may call this function within the lifecycle of the TRITONBACKEND_Model object (between TRITONBACKEND_ModelInitialize and TRITONBACKEND_ModelFinalize) to report the latest usage. To report the memory usage of a model instance, see TRITONBACKEND_ModelInstanceReportMemoryUsage.
  • Get the location of the files that make up the model. The ‘location’ communicated depends on how the model is being communicated to Triton as indicated by ‘artifact_type’.
  • Get the TRITONSERVER_Server object that this model is being served by.
  • Set the model configuration in Triton server. This API should only be called when the backend implements the auto-completion of model configuration and TRITONBACKEND_ModelAutoCompleteConfig returns true in auto_complete_config. Only the inputs, outputs, max batch size, and scheduling choice can be changed. A caveat being scheduling choice can only be changed if none is previously set. Any other changes to the model configuration will be ignored by Triton. This function can only be called from TRITONBACKEND_ModelInitialize, calling in any other context will result in an error being returned. Additionally, Triton server can add some of the missing fields in the provided config with this call. The backend must get the complete configuration again by using TRITONBACKEND_ModelConfig. TRITONBACKEND_ModelSetConfig does not take ownership of the message object and so the caller should call TRITONSERVER_MessageDelete to release the object once the function returns.
  • Set the user-specified state associated with the model. The state is completely owned and managed by the backend.
  • Get the user-specified state associated with the model. The state is completely owned and managed by the backend.
  • Get the version of the model.
  • Get a buffer to use to hold the tensor data for the output. The returned buffer is owned by the output and so should not be freed by the caller. The caller can and should fill the buffer with the output data for the tensor. The lifetime of the buffer matches that of the output and so the buffer should not be accessed after the output tensor object is released.
  • Get the buffer attributes associated with the given output buffer. The returned ‘buffer_attributes’ is owned by the output and so should not be modified or freed by the caller. The lifetime of the ‘buffer_attributes’ matches that of the output and so the ‘buffer_attributes’ should not be accessed after the output tensor object is released. This function must be called after the TRITONBACKEND_OutputBuffer otherwise it might contain incorrect data.
  • Get the correlation ID of the request if it is an unsigned integer. Zero indicates that the request does not have a correlation ID. Returns failure if correlation ID for given request is not an unsigned integer.
  • Get the correlation ID of the request if it is a string. Empty string indicates that the request does not have a correlation ID. Returns error if correlation ID for given request is not a string.
  • Get the flag(s) associated with a request. On return ‘flags’ holds a bitwise-or of all flag values, see TRITONSERVER_RequestFlag for available flags.
  • Get the ID of the request. Can be nullptr if request doesn’t have an ID. The returned string is owned by the request, not the caller, and so should not be modified or freed.
  • Get a named request input. The lifetime of the returned input object matches that of the request and so the input object should not be accessed after the request object is released.
  • Get a request input by index. The order of inputs in a given request is not necessarily consistent with other requests, even if the requests are in the same batch. As a result, you can not assume that an index obtained from one request will point to the same input in a different request.
  • Get the number of input tensors specified in the request.
  • Get the name of an input tensor. The caller does not own the returned string and must not modify or delete it. The lifetime of the returned string extends only as long as ‘request’.
  • Query whether the request is cancelled or not.
  • Returns the preferred memory type and memory type ID of the output buffer for the request. As much as possible, Triton will attempt to return the same memory_type and memory_type_id values that will be returned by the subsequent call to TRITONBACKEND_OutputBuffer, however, the backend must be capable of handling cases where the values differ.
  • Get the number of output tensors requested to be returned in the request.
  • Get the name of a requested output tensor. The caller does not own the returned string and must not modify or delete it. The lifetime of the returned string extends only as long as ‘request’.
  • Get a request parameters by index. The order of parameters in a given request is not necessarily consistent with other requests, even if the requests are in the same batch. As a result, you can not assume that an index obtained from one request will point to the same parameter in a different request.
  • Get the number of parameters specified in the inference request.
  • Release the request. The request should be released when it is no longer needed by the backend. If this call returns with an error (i.e. non-nullptr) then the request was not released and ownership remains with the backend. If this call returns with success, the ‘request’ object is no longer owned by the backend and must not be used. Any tensor names, data types, shapes, input tensors, etc. returned by TRITONBACKEND_Request* functions for this request are no longer valid. If a persistent copy of that data is required it must be created before calling this function.
  • Get the trace associated with a request. The returned trace is owned by the request, not the caller, and so should not be modified or freed. If the request is not being traced, then nullptr will be returned.
  • Destroy a response. It is not necessary to delete a response if TRITONBACKEND_ResponseSend is called as that function transfers ownership of the response object to Triton.
  • Destroy a response factory.
  • Query whether the response factory is cancelled or not.
  • Create the response factory associated with a request.
  • Send response flags without a corresponding response.
  • Create a response for a request.
  • Create a response using a factory.
  • Create an output tensor in the response. The lifetime of the returned output tensor object matches that of the response and so the output tensor object should not be accessed after the response object is deleted.
  • Send a response. Calling this function transfers ownership of the response object to Triton. The caller must not access or delete the response object after calling this function.
  • Set a boolean parameter in the response.
  • Set an integer parameter in the response.
  • Set a string parameter in the response.
  • Get a buffer to use to hold the tensor data for the state. The returned buffer is owned by the state and so should not be freed by the caller. The caller can and should fill the buffer with the state data. The buffer must not be accessed by the backend after TRITONBACKEND_StateUpdate is called. The caller should fill the buffer before calling TRITONBACKEND_StateUpdate.
  • Get the buffer attributes associated with the given state buffer. The returned ‘buffer_attributes’ is owned by the state and so should not be modified or freed by the caller. The lifetime of the ‘buffer_attributes’ matches that of the state.
  • Create a state in the request. The returned state object is only valid before the TRITONBACKEND_StateUpdate is called. The state should not be freed by the caller. If TRITONBACKEND_StateUpdate is not called, the lifetime of the state matches the lifetime of the request. If the state name does not exist in the “state” section of the model configuration, the state will not be created and an error will be returned. If this function is called when sequence batching is not enabled or there is no ‘states’ section in the sequence batching section of the model configuration, this call will return an error.
  • Update the state for the sequence. Calling this function will replace the state stored for this sequence in Triton with ‘state’ provided in the function argument. If this function is called when sequence batching is not enabled or there is no ‘states’ section in the sequence batching section of the model configuration, this call will return an error. The backend is not required to call this function. If the backend doesn’t call TRITONBACKEND_StateUpdate function, this particular state for the sequence will not be updated and the next inference request in the sequence will use the same state as the current inference request.
  • Get the TRITONBACKEND API version supported by the Triton shared library. This value can be compared against the TRITONSERVER_API_VERSION_MAJOR and TRITONSERVER_API_VERSION_MINOR used to build the client to ensure that Triton shared library is compatible with the client.
  • Get the byte size field of the buffer attributes.
  • Get the CudaIpcHandle field of the buffer attributes object.
  • Delete a buffer attributes object.
  • Get the memory type field of the buffer attributes.
  • Get the memory type id field of the buffer attributes.
  • Create a new buffer attributes object. The caller takes ownership of the TRITONSERVER_BufferAttributes object and must call TRITONSERVER_BufferAttributesDelete to release the object.
  • Set the byte size field of the buffer attributes.
  • Set the CudaIpcHandle field of the buffer attributes.
  • Set the memory type field of the buffer attributes.
  • Set the memory type id field of the buffer attributes.
  • Get the size of a Triton datatype in bytes. Zero is returned for TRITONSERVER_TYPE_BYTES because it have variable size. Zero is returned for TRITONSERVER_TYPE_INVALID.
  • Get the string representation of a data type. The returned string is not owned by the caller and so should not be modified or freed.
  • Get the error code.
  • Get the string representation of an error code. The returned string is not owned by the caller and so should not be modified or freed. The lifetime of the returned string extends only as long as ‘error’ and must not be accessed once ‘error’ is deleted.
  • Delete an error object.
  • Get the error message. The returned string is not owned by the caller and so should not be modified or freed. The lifetime of the returned string extends only as long as ‘error’ and must not be accessed once ‘error’ is deleted.
  • Create a new error object. The caller takes ownership of the TRITONSERVER_Error object and must call TRITONSERVER_ErrorDelete to release the object.
  • Get the TRITONSERVER_MetricKind of metric and its corresponding family.
  • Add an input to a request.
  • Add a raw input to a request. The name recognized by the model, data type and shape of the input will be deduced from model configuration. This function must be called at most once on request with no other input to ensure the deduction is accurate.
  • Add an output request to an inference request.
  • Assign a buffer of data to an input. The buffer will be appended to any existing buffers for that input. The ‘inference_request’ object takes ownership of the buffer and so the caller should not modify or free the buffer until that ownership is released by ‘inference_request’ being deleted or by the input being removed from ‘inference_request’.
  • Assign a buffer of data to an input. The buffer will be appended to any existing buffers for that input. The ‘inference_request’ object takes ownership of the buffer and so the caller should not modify or free the buffer until that ownership is released by ‘inference_request’ being deleted or by the input being removed from ‘inference_request’.
  • Assign a buffer of data to an input for execution on all model instances with the specified host policy. The buffer will be appended to any existing buffers for that input on all devices with this host policy. The ‘inference_request’ object takes ownership of the buffer and so the caller should not modify or free the buffer until that ownership is released by ‘inference_request’ being deleted or by the input being removed from ‘inference_request’. If the execution is scheduled on a device that does not have a input buffer specified using this function, then the input buffer specified with TRITONSERVER_InferenceRequestAppendInputData will be used so a non-host policy specific version of data must be added using that API. \param inference_request The request object. \param name The name of the input. \param base The base address of the input data. \param byte_size The size, in bytes, of the input data. \param memory_type The memory type of the input data. \param memory_type_id The memory type id of the input data. \param host_policy_name All model instances executing with this host_policy will use this input buffer for execution. \return a TRITONSERVER_Error indicating success or failure.
  • Cancel an inference request. Requests are canceled on a best effort basis and no guarantee is provided that cancelling a request will result in early termination. Note that the inference request cancellation status will be reset after TRITONSERVER_InferAsync is run. This means that if you cancel the request before calling TRITONSERVER_InferAsync the request will not be cancelled.
  • Get the correlation ID of the inference request as an unsigned integer. Default is 0, which indicates that the request has no correlation ID. If the correlation id associated with the inference request is a string, this function will return a failure. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.
  • Get the correlation ID of the inference request as a string. Default is empty “”, which indicates that the request has no correlation ID. If the correlation id associated with the inference request is an unsigned integer, then this function will return a failure. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.
  • Delete an inference request object.
  • Get the flag(s) associated with a request. On return ‘flags’ holds a bitwise-or of all flag values, see TRITONSERVER_RequestFlag for available flags.
  • Get the ID for a request. The returned ID is owned by ‘inference_request’ and must not be modified or freed by the caller.
  • Query whether the request is cancelled or not.
  • Create a new inference request object.
  • Deprecated. See TRITONSERVER_InferenceRequestPriorityUInt64 instead.
  • Get the priority for a request. The default is 0 indicating that the request does not specify a priority and so will use the model’s default priority.
  • Clear all input data from an input, releasing ownership of the buffer(s) that were appended to the input with TRITONSERVER_InferenceRequestAppendInputData or TRITONSERVER_InferenceRequestAppendInputDataWithHostPolicy \param inference_request The request object. \param name The name of the input.
  • Remove all inputs from a request.
  • Remove all output requests from an inference request.
  • Remove an input from a request.
  • Remove an output request from an inference request.
  • Set a boolean parameter in the request.
  • Set the correlation ID of the inference request to be an unsigned integer. Default is 0, which indicates that the request has no correlation ID. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.
  • Set the correlation ID of the inference request to be a string. The correlation ID is used to indicate two or more inference request are related to each other. How this relationship is handled by the inference server is determined by the model’s scheduling policy.
  • Set the flag(s) associated with a request. ‘flags’ should hold a bitwise-or of all flag values, see TRITONSERVER_RequestFlag for available flags.
  • Set the ID for a request.
  • Set an integer parameter in the request.
  • Deprecated. See TRITONSERVER_InferenceRequestSetPriorityUInt64 instead.
  • Set the priority for a request. The default is 0 indicating that the request does not specify a priority and so will use the model’s default priority.
  • Set the release callback for an inference request. The release callback is called by Triton to return ownership of the request object.
  • Set the allocator and response callback for an inference request. The allocator is used to allocate buffers for any output tensors included in responses that are produced for this request. The response callback is called to return response objects representing responses produced for this request.
  • Set a string parameter in the request.
  • Set the timeout for a request, in microseconds. The default is 0 which indicates that the request has no timeout.
  • Get the timeout for a request, in microseconds. The default is 0 which indicates that the request has no timeout.
  • Delete an inference response object.
  • Return the error status of an inference response. Return a TRITONSERVER_Error object on failure, return nullptr on success. The returned error object is owned by ‘inference_response’ and so should not be deleted by the caller.
  • Get the ID of the request corresponding to a response. The caller does not own the returned ID and must not modify or delete it. The lifetime of all returned values extends until ‘inference_response’ is deleted.
  • Get model used to produce a response. The caller does not own the returned model name value and must not modify or delete it. The lifetime of all returned values extends until ‘inference_response’ is deleted.
  • Get all information about an output tensor. The tensor data is returned as the base pointer to the data and the size, in bytes, of the data. The caller does not own any of the returned values and must not modify or delete them. The lifetime of all returned values extends until ‘inference_response’ is deleted.
  • Get a classification label associated with an output for a given index. The caller does not own the returned label and must not modify or delete it. The lifetime of all returned label extends until ‘inference_response’ is deleted.
  • Get the number of outputs available in the response.
  • Get all information about a parameter. The caller does not own any of the returned values and must not modify or delete them. The lifetime of all returned values extends until ‘inference_response’ is deleted.
  • Get the number of parameters available in the response.
  • Get the string representation of a trace activity. The returned string is not owned by the caller and so should not be modified or freed.
  • Delete a trace object.
  • Get the id associated with a trace. Every trace is assigned an id that is unique across all traces created for a Triton server.
  • Get the string representation of a trace level. The returned string is not owned by the caller and so should not be modified or freed.
  • Get the name of the model associated with a trace. The caller does not own the returned string and must not modify or delete it. The lifetime of the returned string extends only as long as ‘trace’.
  • Get the version of the model associated with a trace.
  • Create a new inference trace object. The caller takes ownership of the TRITONSERVER_InferenceTrace object and must call TRITONSERVER_InferenceTraceDelete to release the object.
  • Get the parent id associated with a trace. The parent id indicates a parent-child relationship between two traces. A parent id value of 0 indicates that there is no parent trace.
  • Get the request id associated with a trace. The caller does not own the returned string and must not modify or delete it. The lifetime of the returned string extends only as long as ‘trace’.
  • Get the child trace, spawned from the parent trace. The caller owns the returned object and must call TRITONSERVER_InferenceTraceDelete to release the object, unless ownership is transferred through other APIs (see TRITONSERVER_ServerInferAsync).
  • Create a new inference trace object. The caller takes ownership of the TRITONSERVER_InferenceTrace object and must call TRITONSERVER_InferenceTraceDelete to release the object.
  • Get the string representation of an instance-group kind. The returned string is not owned by the caller and so should not be modified or freed.
  • Is a log level enabled?
  • Log a message at a given log level if that level is enabled.
  • Get the string representation of a memory type. The returned string is not owned by the caller and so should not be modified or freed.
  • Delete a message object.
  • Create a new message object from serialized JSON string.
  • Get the base and size of the buffer containing the serialized message in JSON format. The buffer is owned by the TRITONSERVER_Message object and should not be modified or freed by the caller. The lifetime of the buffer extends only as long as ‘message’ and must not be accessed once ‘message’ is deleted.
  • Delete a metric object. All TRITONSERVER_Metric* objects should be deleted BEFORE their corresponding TRITONSERVER_MetricFamily* objects have been deleted. If a family is deleted before its metrics, an error will be returned.
  • Delete a metric family object. A TRITONSERVER_MetricFamily* object should be deleted AFTER its corresponding TRITONSERVER_Metric* objects have been deleted. Attempting to delete a family before its metrics will return an error.
  • Create a new metric family object. The caller takes ownership of the TRITONSERVER_MetricFamily object and must call TRITONSERVER_MetricFamilyDelete to release the object.
  • Increment the current value of metric by value. Supports metrics of kind TRITONSERVER_METRIC_KIND_GAUGE for any value, and TRITONSERVER_METRIC_KIND_COUNTER for non-negative values. Returns TRITONSERVER_ERROR_UNSUPPORTED for unsupported TRITONSERVER_MetricKind and TRITONSERVER_ERROR_INVALID_ARG for negative values on a TRITONSERVER_METRIC_KIND_COUNTER metric.
  • Create a new metric object. The caller takes ownership of the TRITONSERVER_Metric object and must call TRITONSERVER_MetricDelete to release the object. The caller is also responsible for ownership of the labels passed in. Each label can be deleted immediately after creating the metric with TRITONSERVER_ParameterDelete if not re-using the labels.
  • Set the current value of metric to value. Supports metrics of kind TRITONSERVER_METRIC_KIND_GAUGE and returns TRITONSERVER_ERROR_UNSUPPORTED for unsupported TRITONSERVER_MetricKind.
  • Get the current value of a metric object. Supports metrics of kind TRITONSERVER_METRIC_KIND_COUNTER and TRITONSERVER_METRIC_KIND_GAUGE, and returns TRITONSERVER_ERROR_UNSUPPORTED for unsupported TRITONSERVER_MetricKind.
  • Delete a metrics object.
  • Get a buffer containing the metrics in the specified format. For each format the buffer contains the following:
  • Create a new parameter object with type TRITONSERVER_PARAMETER_BYTES. The caller takes ownership of the TRITONSERVER_Parameter object and must call TRITONSERVER_ParameterDelete to release the object. The object only maintains a shallow copy of the ‘byte_ptr’ so the data content must be valid until the parameter object is deleted.
  • Delete an parameter object.
  • Create a new parameter object. The caller takes ownership of the TRITONSERVER_Parameter object and must call TRITONSERVER_ParameterDelete to release the object. The object will maintain its own copy of the ‘value’
  • Get the string representation of a parameter type. The returned string is not owned by the caller and so should not be modified or freed.
  • Delete a response allocator.
  • Create a new response allocator object.
  • Set the buffer attributes function for a response allocator object. The function will be called after alloc_fn to set the buffer attributes associated with the output buffer.
  • Set the query function to a response allocator object. Usually the function will be called before alloc_fn to understand what is the allocator’s preferred memory type and memory type ID at the current situation to make different execution decision.
  • Delete a server object. If server is not already stopped it is stopped before being deleted.
  • Perform inference using the meta-data and inputs supplied by the ‘inference_request’. If the function returns success, then the caller releases ownership of ‘inference_request’ and must not access it in any way after this call, until ownership is returned via the ‘request_release_fn’ callback registered in the request object with TRITONSERVER_InferenceRequestSetReleaseCallback.
  • Is the server live?
  • Is the server ready?
  • Load the requested model or reload the model if it is already loaded. The function does not return until the model is loaded or fails to load. Returned error indicates if model loaded successfully or not.
  • Load the requested model or reload the model if it is already loaded, with load parameters provided. The function does not return until the model is loaded or fails to load. Returned error indicates if model loaded successfully or not. Currently the below parameter names are recognized:
  • Get the metadata of the server as a TRITONSERVER_Message object. The caller takes ownership of the message object and must call TRITONSERVER_MessageDelete to release the object.
  • Get the current metrics for the server. The caller takes ownership of the metrics object and must call TRITONSERVER_MetricsDelete to release the object.
  • Get the batch properties of the model. The properties are communicated by a flags value and an (optional) object returned by ‘voidp’.
  • Get the configuration of a model as a TRITONSERVER_Message object. The caller takes ownership of the message object and must call TRITONSERVER_MessageDelete to release the object.
  • Get the index of all unique models in the model repositories as a TRITONSERVER_Message object. The caller takes ownership of the message object and must call TRITONSERVER_MessageDelete to release the object.
  • Is the model ready?
  • Get the metadata of a model as a TRITONSERVER_Message object. The caller takes ownership of the message object and must call TRITONSERVER_MessageDelete to release the object.
  • Get the statistics of a model as a TRITONSERVER_Message object. The caller takes ownership of the object and must call TRITONSERVER_MessageDelete to release the object.
  • Get the transaction policy of the model. The policy is communicated by a flags value.
  • Create a new server object. The caller takes ownership of the TRITONSERVER_Server object and must call TRITONSERVER_ServerDelete to release the object.
  • Add resource count for rate limiting.
  • Delete a server options object.
  • Create a new server options object. The caller takes ownership of the TRITONSERVER_ServerOptions object and must call TRITONSERVER_ServerOptionsDelete to release the object.
  • Set a configuration setting for a named backend in a server options.
  • Set the directory containing backend shared libraries. This directory is searched last after the version and model directory in the model repository when looking for the backend shared library for a model. If the backend is named ‘be’ the directory searched is ‘backend_dir’/be/libtriton_be.so.
  • Set the number of threads used in buffer manager in a server options.
  • Set the cache config that will be used to initialize the cache implementation for “cache_name”.
  • Set the directory containing cache shared libraries. This directory is searched when looking for cache implementations.
  • Enable or disable CPU metrics collection in a server options. CPU metrics are collected if both this option and TRITONSERVER_ServerOptionsSetMetrics are true.
  • Set the total CUDA memory byte size that the server can allocate on given GPU device in a server options. The pinned memory pool will be shared across Triton itself and the backends that use TRITONBACKEND_MemoryManager to allocate memory.
  • Enable or disable exit-on-error in a server options.
  • Set the exit timeout, in seconds, for the server in a server options.
  • Enable or disable GPU metrics collection in a server options. GPU metrics are collected if both this option and TRITONSERVER_ServerOptionsSetMetrics are true.
  • Set a host policy setting for a given policy name in a server options.
  • Enable or disable error level logging.
  • Provide a log output file.
  • Set the logging format.
  • Enable or disable info level logging.
  • Set verbose logging level. Level zero disables verbose logging.
  • Enable or disable warning level logging.
  • Enable or disable metrics collection in a server options.
  • Set a configuration setting for metrics in server options.
  • Set the interval for metrics collection in a server options. This is 2000 milliseconds by default.
  • Set the minimum support CUDA compute capability in a server options.
  • Set the model control mode in a server options. For each mode the models will be managed as the following:
  • Specify the limit on memory usage as a fraction on the device identified by ‘kind’ and ‘device_id’. If model loading on the device is requested and the current memory usage exceeds the limit, the load will be rejected. If not specified, the limit will not be set.
  • Set the number of threads to concurrently load models in a server options.
  • Enable model namespacing to allow serving models with the same name if they are in different namespaces.
  • Set the model repository path in a server options. The path must be the full absolute path to the model repository. This function can be called multiple times with different paths to set multiple model repositories. Note that if a model is not unique across all model repositories at any time, the model will not be available.
  • Set the total pinned memory byte size that the server can allocate in a server options. The pinned memory pool will be shared across Triton itself and the backends that use TRITONBACKEND_MemoryManager to allocate memory.
  • Set the rate limit mode in a server options.
  • Set the directory containing repository agent shared libraries. This directory is searched when looking for the repository agent shared library for a model. If the repo agent is named ‘ra’ the directory searched is ‘repoagent_dir’/ra/libtritonrepoagent_ra.so.
  • Deprecated. See TRITONSERVER_ServerOptionsSetCacheConfig instead.
  • Set the textual ID for the server in a server options. The ID is a name that identifies the server.
  • Set the model to be loaded at startup in a server options. The model must be present in one, and only one, of the specified model repositories. This function can be called multiple times with different model name to set multiple startup models. Note that it only takes affect on TRITONSERVER_MODEL_CONTROL_EXPLICIT mode.
  • Enable or disable strict model configuration handling in a server options.
  • Enable or disable strict readiness handling in a server options.
  • Check the model repository for changes and update server state based on those changes.
  • Register a new model repository. Not available in polling mode.
  • Stop a server object. A server can’t be restarted once it is stopped.
  • Unload the requested model. Unloading a model that is not loaded on server has no affect and success code will be returned. The function does not wait for the requested model to be fully unload and success code will be returned. Returned error indicates if model unloaded successfully or not.
  • Unload the requested model, and also unload any dependent model that was loaded along with the requested model (for example, the models composing an ensemble). Unloading a model that is not loaded on server has no affect and success code will be returned. The function does not wait for the requested model and all dependent models to be fully unload and success code will be returned. Returned error indicates if model unloaded successfully or not.
  • Unregister a model repository. Not available in polling mode.
  • Get the Triton datatype corresponding to a string representation of a datatype.

Type Aliases§