Skip to main content

Module model_runtime

Module model_runtime 

Source
Expand description

Model Runtime Abstraction Layer

Provides a unified interface for hosting multiple model formats (GGUF, ONNX, TensorRT, Safetensors, GGML, CoreML) through a trait-based runtime system.

Architecture:

  • Each model format has its own runtime adapter
  • All runtimes expose OpenAI-compatible HTTP API
  • Maintains 1-hop architecture: Rust → HTTP → Runtime Server
  • Automatic format detection from file extension

Re-exports§

pub use runtime_trait::ModelRuntime;
pub use runtime_trait::ModelFormat;
pub use runtime_trait::RuntimeConfig;
pub use runtime_trait::InferenceRequest;
pub use runtime_trait::InferenceResponse;
pub use gguf_runtime::GGUFRuntime;
pub use onnx_runtime::ONNXRuntime;
pub use tensorrt_runtime::TensorRTRuntime;
pub use safetensors_runtime::SafetensorsRuntime;
pub use ggml_runtime::GGMLRuntime;
pub use coreml_runtime::CoreMLRuntime;
pub use format_detector::FormatDetector;
pub use platform_detector::HardwareCapabilities;
pub use runtime_manager::RuntimeManager;

Modules§

coreml_runtime
CoreML Runtime Adapter (macOS only) For Apple Silicon optimized models
format_detector
Format Detector
ggml_runtime
GGML Runtime Adapter (legacy format) Similar to GGUF but for older llama.cpp GGML models
gguf_runtime
GGUF Runtime Adapter
onnx_runtime
ONNX Runtime Adapter
platform_detector
Platform and Hardware Detection
runtime_manager
Runtime Manager
runtime_trait
Core trait and types for model runtime abstraction
safetensors_runtime
Safetensors Runtime Adapter
tensorrt_runtime
TensorRT Runtime Adapter