pub struct InferenceSession { /* private fields */ }
Expand description

An inference session represents the state of the text generation. This holds the full context window, as long as several additional parameters used during sampling.

Safety

This implements Send as it can be sent to another thread. However, it does not implement Sync - it cannot be used from multiple threads at the same time.

Consider spawning multiple inference sessions for the same model if you need to use it from multiple threads.

Implementations§

source§

impl InferenceSession

source

pub fn feed_prompt<E: Error + 'static>( &mut self, model: &dyn Model, params: &InferenceParameters, prompt: &str, output_request: &mut EvaluateOutputRequest, callback: impl FnMut(&[u8]) -> Result<(), E> ) -> Result<(), InferenceError>

Feed a prompt to the model for this session.

source

pub fn infer_next_token<'v>( &mut self, model: &'v dyn Model, params: &InferenceParameters, output_request: &mut EvaluateOutputRequest, rng: &mut impl Rng ) -> Result<&'v [u8], InferenceError>

Infer the next token for this session.

source

pub fn infer<E: Error + 'static>( &mut self, model: &dyn Model, prompt: &str, output_request: &mut EvaluateOutputRequest, rng: &mut impl Rng, callback: impl FnMut(&str) -> Result<(), E> ) -> Result<InferenceStats, InferenceError>

Calls Self::infer_with_params with the InferenceParameters and InferenceWithPromptParameters provided by the Model; refer to Self::infer_with_params for more information.

source

pub fn infer_with_params<E: Error + 'static>( &mut self, model: &dyn Model, params: &InferenceParameters, prompt_params: &InferenceWithPromptParameters, prompt: &str, output_request: &mut EvaluateOutputRequest, rng: &mut impl Rng, callback: impl FnMut(&str) -> Result<(), E> ) -> Result<InferenceStats, InferenceError>

Generate text by using the provided Model to evaluate the prompt. The callback is called with each new token until an end-of-text (EOT) token is encountered or the maximum number of tokens have been generated (specified by InferenceWithPromptParameters::maximum_token_count). The EvaluateOutputRequest is used to specify additional data to fetch from the model.

source

pub fn sample_top_p_top_k( &self, params: &InferenceParameters, rng: &mut impl Rng ) -> TokenId

Sample a token using Top-P/Top-K sampling and the last logits from this session.

source

pub unsafe fn get_snapshot(&mut self) -> InferenceSnapshotRef<'_>

Obtains a serializable snapshot of the current inference status. This can be used to cache the state of the model and store them into a file.

Safety

This function provides raw access to the underlying memory owned by the ggml context. While the provided InferenceSnapshotRef object is alive, no other methods for this model object should be called.

source

pub fn from_snapshot( snapshot: InferenceSnapshot, model: &dyn Model ) -> Result<Self, SnapshotError>

Creates an InferenceSession from a snapshot.

source§

impl InferenceSession

source

pub fn new( params: InferenceSessionParameters, n_ctx: usize, n_layer: usize, n_embd: usize, n_vocab: usize ) -> InferenceSession

Create a new InferenceSession

Trait Implementations§

source§

impl Clone for InferenceSession

source§

fn clone(&self) -> Self

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Send for InferenceSession

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

§

fn vzip(self) -> V