Expand description
This crate provides a unified interface for loading and using Large Language Models (LLMs).
This is the base crate that implementors can use to implement their own LLMs.
As a user, you probably want to use the llm crate instead.
Re-exports§
pub use model::Hyperparameters;
pub use model::KnownModel;
pub use model::Model;
pub use model::ModelParameters;
pub use model::OutputRequest;
pub use util::TokenUtf8Buffer;
pub use ggml;
Modules§
Structs§
- Inference
Parameters - The parameters for text generation.
- Inference
Request - Settings specific to InferenceSession::infer.
- Inference
Session - An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
- Inference
Session Config - Configuration for an inference session.
- Inference
Snapshot - A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
- Inference
Stats - Statistics about the inference process.
- Loader
- A GGML format loader for LLMs.
- Mmap
- A handle to an immutable memory mapped buffer.
- Token
Bias - A list of tokens to bias during the process of inferencing.
- Vocabulary
- The vocabulary used by a model.
Enums§
- Container
Type - The format of the file containing the model.
- Element
Type - The type of a value in
ggml
. - File
Type - How the tensors are stored in GGML LLM models.
- Inference
Error - Errors encountered during the inference process.
- Load
Error - Errors encountered during the loading process.
- Load
Progress - Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
- ModelKV
Memory Type - Allowed types for the model memory K/V tensors.
- Quantize
Error - Errors encountered during the quantization process.
- Quantize
Progress - Progress of quantization.
- Snapshot
Error - Errors encountered during the snapshot process.
Traits§
- Tensor
Loader - Used by models to fetch tensors from a loader.
Functions§
- load
- Load a GGML model from the
path
and configure it per theparams
. The status of the loading process will be reported throughload_progress_callback
. - load_
progress_ callback_ stdout - A implementation for
load_progress_callback
that outputs tostdout
. - quantize
- Quantizes a model.
Type Aliases§
- TokenId
- The identifier of a token in a vocabulary.