pub struct LLMEngine {
pub tokenizer: Arc<dyn TokenizerTrait>,
/* private fields */
}Expand description
§Engine Wrapper
A universal wrapper for LLM engines.
It aggregates the specific inference backend and a tokenizer
used to calculate context management overhead.
Fields§
§tokenizer: Arc<dyn TokenizerTrait>The active tokenizer utilized for high-speed length estimation.
Implementations§
Source§impl LLMEngine
impl LLMEngine
Sourcepub fn load(cfg: LLMEngineConfig) -> Result<Self>
pub fn load(cfg: LLMEngineConfig) -> Result<Self>
Loads the corresponding specific underlying engine based on the unified configuration.
Sourcepub fn from_custom(backend: Box<dyn LLMEngineTrait>) -> Result<Self>
👎Deprecated since 0.3.3: use LLMEngine::load(LLMEngineConfig::Custom(backend)) instead
pub fn from_custom(backend: Box<dyn LLMEngineTrait>) -> Result<Self>
use LLMEngine::load(LLMEngineConfig::Custom(backend)) instead
Injects a custom LLM backend via the LLMEngineConfig::Custom variant.
§Deprecation
This method is deprecated. Use LLMEngine::load(LLMEngineConfig::Custom(backend)) instead.
§Migration
// Old (deprecated):
let engine = LLMEngine::from_custom(Box::new(MyEngine))?;
// New (recommended):
let engine = LLMEngine::load(LLMEngineConfig::Custom(Box::new(MyEngine)))?;Sourcepub fn with_custom_tokenizer<T: TokenizerTrait + 'static>(
self,
tokenizer: T,
) -> Self
pub fn with_custom_tokenizer<T: TokenizerTrait + 'static>( self, tokenizer: T, ) -> Self
Replaces the framework’s default cl100k_base tokenizer, injecting an
accurate tokenization algorithm that better matches the specific model.
Sourcepub async fn chat(&self, request: LLMRequest) -> Result<String>
pub async fn chat(&self, request: LLMRequest) -> Result<String>
§Proxy Method Routing
Executes a complete synchronous chat inference, returning the full response string.
Sourcepub async fn chat_stream(&self, request: LLMRequest, tx: Sender<Result<String>>)
pub async fn chat_stream(&self, request: LLMRequest, tx: Sender<Result<String>>)
Executes streaming chat inference.
Generated text slices must be wrapped in Ok(String) and sent via tx.
If an error occurs, it should send Err(AmbiError) via tx, and the
framework will automatically interrupt the stream and clean up resources.
Sourcepub fn reset_context(&self)
pub fn reset_context(&self)
Resets the engine context (e.g., clearing the KV Cache for local inference engines).
Sourcepub fn supports_multimodal(&self) -> bool
pub fn supports_multimodal(&self) -> bool
Declares whether the engine backend supports multimodal input (image parsing).
If false, the framework will perform a fail-fast interception when
processing inputs containing images.
Sourcepub fn count_tokens(&self, text: &str) -> Result<usize>
pub fn count_tokens(&self, text: &str) -> Result<usize>
Fast, purely synchronous token calculation to support the Agent’s memory eviction algorithm.
Sourcepub fn backend_downcast_ref<T: 'static>(&self) -> Result<&T>
pub fn backend_downcast_ref<T: 'static>(&self) -> Result<&T>
Downcasts the engine backend to a concrete type.