Module model_runtime

Expand description

Model Runtime Abstraction Layer

Provides a unified interface for hosting multiple model formats (GGUF, ONNX, TensorRT, Safetensors, GGML, CoreML) through a trait-based runtime system.

Architecture:

Each model format has its own runtime adapter
All runtimes expose OpenAI-compatible HTTP API
Maintains 1-hop architecture: Rust → HTTP → Runtime Server
Automatic format detection from file extension

Re-exports§

pub use runtime_trait::ModelRuntime;
pub use runtime_trait::ModelFormat;
pub use runtime_trait::RuntimeConfig;
pub use runtime_trait::InferenceRequest;
pub use runtime_trait::InferenceResponse;
pub use gguf_runtime::GGUFRuntime;
pub use onnx_runtime::ONNXRuntime;
pub use tensorrt_runtime::TensorRTRuntime;
pub use safetensors_runtime::SafetensorsRuntime;
pub use ggml_runtime::GGMLRuntime;
pub use coreml_runtime::CoreMLRuntime;
pub use format_detector::FormatDetector;
pub use platform_detector::HardwareCapabilities;
pub use runtime_manager::RuntimeManager;

Modules§

coreml_runtime: CoreML Runtime Adapter (macOS only) For Apple Silicon optimized models
format_detector: Format Detector
ggml_runtime: GGML Runtime Adapter (legacy format) Similar to GGUF but for older llama.cpp GGML models
gguf_runtime: GGUF Runtime Adapter
onnx_runtime: ONNX Runtime Adapter
platform_detector: Platform and Hardware Detection
runtime_manager: Runtime Manager
runtime_trait: Core trait and types for model runtime abstraction
safetensors_runtime: Safetensors Runtime Adapter
tensorrt_runtime: TensorRT Runtime Adapter

Module model_runtime

Module model_runtime Copy item path

Re-exports§

Modules§

Module model_runtime