Struct llm_base::InferenceSession
source · pub struct InferenceSession { /* private fields */ }Expand description
An inference session represents the state of the text generation. This holds the full context window, as long as several additional parameters used during sampling.
Safety
This implements Send as it can be sent to another thread. However, it does
not implement Sync - it cannot be used from multiple threads at the same time.
Consider spawning multiple inference sessions for the same model if you need to use it from multiple threads.
Implementations§
source§impl InferenceSession
impl InferenceSession
sourcepub fn feed_prompt<E: Error + 'static>(
&mut self,
model: &dyn Model,
params: &InferenceParameters,
prompt: &str,
output_request: &mut EvaluateOutputRequest,
callback: impl FnMut(&[u8]) -> Result<(), E>
) -> Result<(), InferenceError>
pub fn feed_prompt<E: Error + 'static>( &mut self, model: &dyn Model, params: &InferenceParameters, prompt: &str, output_request: &mut EvaluateOutputRequest, callback: impl FnMut(&[u8]) -> Result<(), E> ) -> Result<(), InferenceError>
Feed a prompt to the model for this session.
sourcepub fn infer_next_token<'v>(
&mut self,
model: &'v dyn Model,
params: &InferenceParameters,
output_request: &mut EvaluateOutputRequest,
rng: &mut impl Rng
) -> Result<&'v [u8], InferenceError>
pub fn infer_next_token<'v>( &mut self, model: &'v dyn Model, params: &InferenceParameters, output_request: &mut EvaluateOutputRequest, rng: &mut impl Rng ) -> Result<&'v [u8], InferenceError>
Infer the next token for this session.
sourcepub fn infer<E: Error + 'static>(
&mut self,
model: &dyn Model,
prompt: &str,
output_request: &mut EvaluateOutputRequest,
rng: &mut impl Rng,
callback: impl FnMut(&str) -> Result<(), E>
) -> Result<InferenceStats, InferenceError>
pub fn infer<E: Error + 'static>( &mut self, model: &dyn Model, prompt: &str, output_request: &mut EvaluateOutputRequest, rng: &mut impl Rng, callback: impl FnMut(&str) -> Result<(), E> ) -> Result<InferenceStats, InferenceError>
Calls Self::infer_with_params with the InferenceParameters and InferenceWithPromptParameters provided by the Model; refer to Self::infer_with_params for more information.
sourcepub fn infer_with_params<E: Error + 'static>(
&mut self,
model: &dyn Model,
params: &InferenceParameters,
prompt_params: &InferenceWithPromptParameters,
prompt: &str,
output_request: &mut EvaluateOutputRequest,
rng: &mut impl Rng,
callback: impl FnMut(&str) -> Result<(), E>
) -> Result<InferenceStats, InferenceError>
pub fn infer_with_params<E: Error + 'static>( &mut self, model: &dyn Model, params: &InferenceParameters, prompt_params: &InferenceWithPromptParameters, prompt: &str, output_request: &mut EvaluateOutputRequest, rng: &mut impl Rng, callback: impl FnMut(&str) -> Result<(), E> ) -> Result<InferenceStats, InferenceError>
Generate text by using the provided Model to evaluate the prompt.
The callback is called with each new token until an end-of-text (EOT)
token is encountered or the maximum number of tokens have been
generated (specified by InferenceWithPromptParameters::maximum_token_count).
The EvaluateOutputRequest is used to specify additional data to fetch from
the model.
sourcepub fn sample_top_p_top_k(
&self,
params: &InferenceParameters,
rng: &mut impl Rng
) -> TokenId
pub fn sample_top_p_top_k( &self, params: &InferenceParameters, rng: &mut impl Rng ) -> TokenId
Sample a token using Top-P/Top-K sampling and the last logits from this session.
sourcepub unsafe fn get_snapshot(&mut self) -> InferenceSnapshotRef<'_>
pub unsafe fn get_snapshot(&mut self) -> InferenceSnapshotRef<'_>
Obtains a serializable snapshot of the current inference status. This can be used to cache the state of the model and store them into a file.
Safety
This function provides raw access to the underlying memory owned by the
ggml context. While the provided InferenceSnapshotRef object is alive,
no other methods for this model object should be called.
sourcepub fn from_snapshot(
snapshot: InferenceSnapshot,
model: &dyn Model
) -> Result<Self, SnapshotError>
pub fn from_snapshot( snapshot: InferenceSnapshot, model: &dyn Model ) -> Result<Self, SnapshotError>
Creates an InferenceSession from a snapshot.
source§impl InferenceSession
impl InferenceSession
sourcepub fn new(
params: InferenceSessionParameters,
n_ctx: usize,
n_layer: usize,
n_embd: usize,
n_vocab: usize
) -> InferenceSession
pub fn new( params: InferenceSessionParameters, n_ctx: usize, n_layer: usize, n_embd: usize, n_vocab: usize ) -> InferenceSession
Create a new InferenceSession