Expand description
Bindings to the llama.cpp library.
As llama.cpp is a very fast moving target, this crate does not attempt to create a stable API with all the rust idioms. Instead it provided safe wrappers around nearly direct bindings to llama.cpp. This makes it easier to keep up with the changes in llama.cpp, but does mean that the API is not as nice as it could be.
§Examples
§Feature Flags
- cudaenables CUDA gpu support.
- sampleradds the [- context::sample::sampler] struct for a more rusty way of sampling.
Modules§
- context
- Safe wrapper around llama_context.
- llama_backend 
- Representation of an initialized llama backend
- llama_batch 
- Safe wrapper around llama_batch.
- model
- A safe wrapper around llama_model.
- sampling
- Safe wrapper around llama_sampler.
- timing
- Safe wrapper around llama_timings.
- token
- Safe wrappers around llama_token_dataandllama_token_data_array.
- token_type 
- Utilities for working with llama_token_typevalues.
Structs§
- LogOptions
- Options to configure how llama.cpp logs are intercepted.
Enums§
- ApplyChat Template Error 
- Failed to apply model chat template.
- ChatTemplate Error 
- There was an error while getting the chat template from a model.
- DecodeError 
- Failed to decode a batch.
- EmbeddingsError 
- When embedding related functions fail
- EncodeError 
- Failed to decode a batch.
- LLamaCppError 
- All errors that can occur in the llama-cpp crate.
- LlamaContext Load Error 
- Failed to Load context
- LlamaLora Adapter Init Error 
- An error that can occur when loading a model.
- LlamaLora Adapter Remove Error 
- An error that can occur when loading a model.
- LlamaLora Adapter SetError 
- An error that can occur when loading a model.
- LlamaModel Load Error 
- An error that can occur when loading a model.
- MetaValError 
- Failed fetching metadata value
- NewLlamaChat Message Error 
- Failed to apply model chat template.
- StringToToken Error 
- Failed to convert a string to a token sequence.
- TokenToString Error 
- An error that can occur when converting a token to a string.
Functions§
- ggml_time_ us 
- Get the time in microseconds according to ggml
- llama_supports_ mlock 
- checks if mlock is supported
- llama_time_ us 
- get the time (in microseconds) according to llama.cpp
- max_devices 
- get the max number of devices according to llama.cpp (this is generally cuda devices)
- mlock_supported 
- is memory locking supported according to llama.cpp
- mmap_supported 
- is memory mapping supported according to llama.cpp
- send_logs_ to_ tracing 
- Redirect llama.cpp logs into tracing.
Type Aliases§
- Result
- A failable result from a llama.cpp function.