pub struct InferenceSession { /* private fields */ }
Expand description
An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
§Safety
This implements Send
as it can be sent to another thread. However, it does
not implement Sync
- it cannot be used from multiple threads at the same time.
Consider spawning multiple inference sessions for the same model if you need to use it from multiple threads.
Implementations§
Source§impl InferenceSession
impl InferenceSession
Sourcepub fn feed_prompt<E: Error + 'static>(
&mut self,
model: &dyn Model,
params: &InferenceParameters,
prompt: &str,
output_request: &mut OutputRequest,
callback: impl FnMut(&[u8]) -> Result<(), E>,
) -> Result<(), InferenceError>
pub fn feed_prompt<E: Error + 'static>( &mut self, model: &dyn Model, params: &InferenceParameters, prompt: &str, output_request: &mut OutputRequest, callback: impl FnMut(&[u8]) -> Result<(), E>, ) -> Result<(), InferenceError>
Feed a prompt to the model for this session.
Sourcepub fn infer_next_token<'v>(
&mut self,
model: &'v dyn Model,
params: &InferenceParameters,
output_request: &mut OutputRequest,
rng: &mut impl Rng,
) -> Result<&'v [u8], InferenceError>
pub fn infer_next_token<'v>( &mut self, model: &'v dyn Model, params: &InferenceParameters, output_request: &mut OutputRequest, rng: &mut impl Rng, ) -> Result<&'v [u8], InferenceError>
Infer the next token for this session.
Sourcepub fn infer<E: Error + 'static>(
&mut self,
model: &dyn Model,
rng: &mut impl Rng,
request: &InferenceRequest<'_>,
output_request: &mut OutputRequest,
callback: impl FnMut(&str) -> Result<(), E>,
) -> Result<InferenceStats, InferenceError>
pub fn infer<E: Error + 'static>( &mut self, model: &dyn Model, rng: &mut impl Rng, request: &InferenceRequest<'_>, output_request: &mut OutputRequest, callback: impl FnMut(&str) -> Result<(), E>, ) -> Result<InferenceStats, InferenceError>
Generate text by using the provided Model to evaluate the prompt
.
The callback
is called with each new token until an end-of-text (EOT)
token is encountered or the maximum number of tokens have been
generated (specified by InferenceRequest::maximum_token_count).
This is a wrapper around Self::feed_prompt and Self::infer_next_token.
Sourcepub fn sample_top_p_top_k(
&self,
params: &InferenceParameters,
rng: &mut impl Rng,
) -> TokenId
pub fn sample_top_p_top_k( &self, params: &InferenceParameters, rng: &mut impl Rng, ) -> TokenId
Sample a token using Top-P/Top-K sampling and the last logits from this session.
Sourcepub unsafe fn get_snapshot(&mut self) -> InferenceSnapshotRef<'_>
pub unsafe fn get_snapshot(&mut self) -> InferenceSnapshotRef<'_>
Obtains a serializable snapshot of the current inference status. This can be used to cache the state of the model and store them into a file.
§Safety
This function provides raw access to the underlying memory owned by the
ggml context. While the provided InferenceSnapshotRef
object is alive,
no other methods for this model object should be called.
Sourcepub fn from_snapshot(
snapshot: InferenceSnapshot,
model: &dyn Model,
) -> Result<Self, SnapshotError>
pub fn from_snapshot( snapshot: InferenceSnapshot, model: &dyn Model, ) -> Result<Self, SnapshotError>
Creates an InferenceSession from a snapshot.
Source§impl InferenceSession
impl InferenceSession
Sourcepub fn new(
config: InferenceSessionConfig,
n_ctx: usize,
n_layer: usize,
n_embd: usize,
n_vocab: usize,
) -> InferenceSession
pub fn new( config: InferenceSessionConfig, n_ctx: usize, n_layer: usize, n_embd: usize, n_vocab: usize, ) -> InferenceSession
Create a new InferenceSession