Struct InferenceSession

Source
pub struct InferenceSession { /* private fields */ }
Expand description

An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.

§Safety

This implements Send as it can be sent to another thread. However, it does not implement Sync - it cannot be used from multiple threads at the same time.

Consider spawning multiple inference sessions for the same model if you need to use it from multiple threads.

Implementations§

Source§

impl InferenceSession

Source

pub fn feed_prompt<E: Error + 'static>( &mut self, model: &dyn Model, params: &InferenceParameters, prompt: &str, output_request: &mut OutputRequest, callback: impl FnMut(&[u8]) -> Result<(), E>, ) -> Result<(), InferenceError>

Feed a prompt to the model for this session.

Source

pub fn infer_next_token<'v>( &mut self, model: &'v dyn Model, params: &InferenceParameters, output_request: &mut OutputRequest, rng: &mut impl Rng, ) -> Result<&'v [u8], InferenceError>

Infer the next token for this session.

Source

pub fn infer<E: Error + 'static>( &mut self, model: &dyn Model, rng: &mut impl Rng, request: &InferenceRequest<'_>, output_request: &mut OutputRequest, callback: impl FnMut(&str) -> Result<(), E>, ) -> Result<InferenceStats, InferenceError>

Generate text by using the provided Model to evaluate the prompt.

The callback is called with each new token until an end-of-text (EOT) token is encountered or the maximum number of tokens have been generated (specified by InferenceRequest::maximum_token_count).

This is a wrapper around Self::feed_prompt and Self::infer_next_token.

Source

pub fn sample_top_p_top_k( &self, params: &InferenceParameters, rng: &mut impl Rng, ) -> TokenId

Sample a token using Top-P/Top-K sampling and the last logits from this session.

Source

pub unsafe fn get_snapshot(&mut self) -> InferenceSnapshotRef<'_>

Obtains a serializable snapshot of the current inference status. This can be used to cache the state of the model and store them into a file.

§Safety

This function provides raw access to the underlying memory owned by the ggml context. While the provided InferenceSnapshotRef object is alive, no other methods for this model object should be called.

Source

pub fn from_snapshot( snapshot: InferenceSnapshot, model: &dyn Model, ) -> Result<Self, SnapshotError>

Creates an InferenceSession from a snapshot.

Source§

impl InferenceSession

Source

pub fn new( config: InferenceSessionConfig, n_ctx: usize, n_layer: usize, n_embd: usize, n_vocab: usize, ) -> InferenceSession

Create a new InferenceSession

Trait Implementations§

Source§

impl Clone for InferenceSession

Source§

fn clone(&self) -> Self

Returns a duplicate of the value. Read more
1.0.0 · Source§

const fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Send for InferenceSession

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V