Expand description
This crate provides a unified interface for loading and using Large Language Models (LLMs).
This is the base crate that implementors can use to implement their own LLMs.
As a user, you probably want to use the llm crate instead.
Re-exports
pub use ggml;
pub use model::Hyperparameters;
pub use model::KnownModel;
pub use model::Model;
pub use model::ModelParameters;
pub use model::OutputRequest;
pub use util::TokenUtf8Buffer;
Modules
- Large language model traits and types
- Utilities for interacting with LLMs and loading them.
Structs
- The parameters for text generation.
- Settings specific to InferenceSession::infer.
- An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
- Configuration for an inference session.
- A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
- Statistics about the inference process.
- A GGML format loader for LLMs.
- A handle to an immutable memory mapped buffer.
- A list of tokens to bias during the process of inferencing.
- The vocabulary used by a model.
Enums
- The format of the file containing the model.
- The type of a value in
ggml
. - How the tensors are stored in GGML LLM models.
- Errors encountered during the inference process.
- Errors encountered during the loading process.
- Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
- Allowed types for the model memory K/V tensors.
- Errors encountered during the quantization process.
- Progress of quantization.
- Errors encountered during the snapshot process.
Traits
- Used by models to fetch tensors from a loader.
Functions
- Load a GGML model from the
path
and configure it per theparams
. The status of the loading process will be reported throughload_progress_callback
. - A implementation for
load_progress_callback
that outputs tostdout
. - Quantizes a model.
Type Definitions
- The identifier of a token in a vocabulary.