pub struct InferenceEngine {
pub config: InferenceConfig,
pub unified_registry: UnifiedRegistry,
pub adaptive_router: AdaptiveRouter,
pub outcome_tracker: Arc<RwLock<OutcomeTracker>>,
pub registry: ModelRegistry,
pub router: ModelRouter,
/* private fields */
}Expand description
The main inference engine. Thread-safe, lazily loads models.
Now includes the unified registry, adaptive router, and outcome tracker for schema-driven model selection with learned performance profiles.
Fields§
§config: InferenceConfig§unified_registry: UnifiedRegistryUnified model registry (local + remote).
adaptive_router: AdaptiveRouterAdaptive router with three-phase selection.
outcome_tracker: Arc<RwLock<OutcomeTracker>>Outcome tracker for learning from results.
registry: ModelRegistry§router: ModelRouterImplementations§
Source§impl InferenceEngine
impl InferenceEngine
pub fn new(config: InferenceConfig) -> Self
Sourcepub async fn route_adaptive(&self, prompt: &str) -> AdaptiveRoutingDecision
pub async fn route_adaptive(&self, prompt: &str) -> AdaptiveRoutingDecision
Route a prompt using the adaptive router (new). Returns full decision context.
Sourcepub fn route(&self, prompt: &str) -> RoutingDecision
pub fn route(&self, prompt: &str) -> RoutingDecision
Route a prompt to the best model without executing (legacy compat).
Sourcepub async fn generate_tracked(
&self,
req: GenerateRequest,
) -> Result<InferenceResult, InferenceError>
pub async fn generate_tracked( &self, req: GenerateRequest, ) -> Result<InferenceResult, InferenceError>
Generate text from a prompt with outcome tracking.
Returns InferenceResult with trace_id for reporting outcomes.
Sourcepub async fn generate(
&self,
req: GenerateRequest,
) -> Result<String, InferenceError>
pub async fn generate( &self, req: GenerateRequest, ) -> Result<String, InferenceError>
Generate text from a prompt (legacy API, no outcome tracking).
When req.model is None, uses intelligent routing based on prompt complexity.
Sourcepub async fn embed(
&self,
req: EmbedRequest,
) -> Result<Vec<Vec<f32>>, InferenceError>
pub async fn embed( &self, req: EmbedRequest, ) -> Result<Vec<Vec<f32>>, InferenceError>
Generate embeddings for text using the dedicated embedding model. Uses Qwen3-Embedding with proper last-token hidden state extraction.
Sourcepub async fn classify(
&self,
req: ClassifyRequest,
) -> Result<Vec<ClassifyResult>, InferenceError>
pub async fn classify( &self, req: ClassifyRequest, ) -> Result<Vec<ClassifyResult>, InferenceError>
Classify text against candidate labels.
When req.model is None, routes to the smallest available model.
Sourcepub fn list_models_unified(&self) -> Vec<ModelInfo>
pub fn list_models_unified(&self) -> Vec<ModelInfo>
List all known models and their status (new registry).
Sourcepub fn list_models(&self) -> Vec<ModelInfo>
pub fn list_models(&self) -> Vec<ModelInfo>
List all known models and their download status (legacy).
Sourcepub async fn pull_model(&self, name: &str) -> Result<PathBuf, InferenceError>
pub async fn pull_model(&self, name: &str) -> Result<PathBuf, InferenceError>
Download a model if not already present.
Sourcepub fn remove_model(&self, name: &str) -> Result<(), InferenceError>
pub fn remove_model(&self, name: &str) -> Result<(), InferenceError>
Remove a downloaded model.
Sourcepub fn register_model(&mut self, schema: ModelSchema)
pub fn register_model(&mut self, schema: ModelSchema)
Register a model in the unified registry.
Sourcepub fn outcome_tracker(&self) -> Arc<RwLock<OutcomeTracker>>
pub fn outcome_tracker(&self) -> Arc<RwLock<OutcomeTracker>>
Get outcome tracker for external use (e.g., memgine integration).
Sourcepub async fn export_profiles(&self) -> Vec<ModelProfile>
pub async fn export_profiles(&self) -> Vec<ModelProfile>
Export model performance profiles for persistence.
Sourcepub async fn import_profiles(&self, profiles: Vec<ModelProfile>)
pub async fn import_profiles(&self, profiles: Vec<ModelProfile>)
Import model performance profiles (from persistence).
Auto Trait Implementations§
impl Freeze for InferenceEngine
impl !RefUnwindSafe for InferenceEngine
impl Send for InferenceEngine
impl Sync for InferenceEngine
impl Unpin for InferenceEngine
impl UnsafeUnpin for InferenceEngine
impl !UnwindSafe for InferenceEngine
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more