car_inference/handle.rs
1//! `InferenceHandle` — minimal trait surface for embedders that hold
2//! a "thing that can do inference" without committing to the concrete
3//! `InferenceEngine`.
4//!
5//! Motivation: pre-v0.8, `MemgineEngine` held `Arc<InferenceEngine>`
6//! directly. Every binary that wanted memgine — `car-cli`'s
7//! `cmd_distill` / `cmd_reason` / `cmd_dream`, the bench harness,
8//! the eval bridge — therefore had to instantiate a full in-process
9//! engine even when a daemon was running and serving the same
10//! capabilities over WebSocket. That was a v0.7 holdover: cold-start
11//! cost, doubled model weights in memory, and a CLI tracker that
12//! couldn't see any of the daemon's accumulated outcome data.
13//!
14//! This trait lets memgine accept any handle that satisfies the
15//! narrow surface it actually uses (`generate` + `embed`). The
16//! concrete `InferenceEngine` implements it for the in-process
17//! path; downstream binaries can provide their own daemon-proxy
18//! implementation that dispatches each call over the daemon's
19//! existing `infer` / `embed` JSON-RPC methods, no second engine
20//! needed.
21//!
22//! Tracked in Parslee-ai/car#188.
23
24use crate::tasks::{EmbedRequest, GenerateRequest};
25use crate::InferenceError;
26
27/// Inference operations memgine (and other embedders) need.
28///
29/// Implementations must be `Send + Sync` because memgine holds the
30/// handle in an `Arc` and reaches it from `&self` methods that may
31/// run inside `tokio::spawn` tasks during consolidation passes.
32///
33/// **Why these two methods.** The CLI / memgine call sites only
34/// reach `generate` (for skill distillation, reasoning, dream
35/// consolidation) and `embed` (for semantic similarity in
36/// retrieval + the speculative summary pre-compute). Other engine
37/// methods — classification, routing, tokenization, image / video
38/// generation — are reached either through the daemon directly
39/// or via the concrete engine path. Adding them to the trait
40/// would broaden the daemon-proxy implementation surface without
41/// memgine benefit.
42#[async_trait::async_trait]
43pub trait InferenceHandle: Send + Sync {
44 /// Run a generation request to completion. Same contract as
45 /// `InferenceEngine::generate`: caller passes a `GenerateRequest`
46 /// (which may carry an explicit `model`, a routing hint, tools,
47 /// or a thinking budget), receives the final text or an
48 /// `InferenceError`.
49 async fn generate(&self, req: GenerateRequest) -> Result<String, InferenceError>;
50
51 /// Encode one or more texts as embedding vectors. Same contract
52 /// as `InferenceEngine::embed`: returns one `Vec<f32>` per input
53 /// text in the same order.
54 async fn embed(&self, req: EmbedRequest) -> Result<Vec<Vec<f32>>, InferenceError>;
55}
56
57#[async_trait::async_trait]
58impl InferenceHandle for crate::InferenceEngine {
59 async fn generate(&self, req: GenerateRequest) -> Result<String, InferenceError> {
60 crate::InferenceEngine::generate(self, req).await
61 }
62
63 async fn embed(&self, req: EmbedRequest) -> Result<Vec<Vec<f32>>, InferenceError> {
64 crate::InferenceEngine::embed(self, req).await
65 }
66}