Expand description
§llm_client: The Easiest Rust Interface for Local LLMs
The llm_client crate is a workspace member of the llm_client project.
Add to your Cargo.toml:
# For Mac (CPU and GPU), windows (CPU and CUDA), or linux (CPU and CUDA)
llm_client="*"
This will download and build llama.cpp. See build.md for other features and backends like mistral.rs.
use Llmclient::prelude::*;
let llm_client = LlmClient::llama_cpp()
.mistral7b_instruct_v0_3() // Uses a preset model
.init() // Downloads model from hugging face and starts the inference interface
.await?;
Several of the most common models are available as presets. Loading from local models is also fully supported. See models.md for more information.
§An Interface for Deterministic Signals from Probabilistic LLM Vibes
§Reasoning with Primitive Outcomes
A constraint enforced CoT process for reasoning. First, we get the LLM to ‘justify’ an answer in plain english. This allows the LLM to ‘think’ by outputting the stream of tokens required to come to an answer. Then we take that ‘justification’, and prompt the LLM to parse it for the answer. See the workflow for implementation details.
- Currently supporting returning booleans, u32s, and strings from a list of options
- Can be ‘None’ when ran with
return_optional_primitive()
// boolean outcome
let reason_request = llm_client.reason().boolean();
reason_request
.instructions()
.set_content("Does this email subject indicate that the email is spam?");
reason_request
.supporting_material()
.set_content("You'll never believe these low, low prices 💲💲💲!!!");
let res: bool = reason_request.return_primitive().await.unwrap();
assert_eq!(res, true);
// u32 outcome
let reason_request = llm_client.reason().integer();
reason_request.primitive.lower_bound(0).upper_bound(10000);
reason_request
.instructions()
.set_content("How many times is the word 'llm' mentioned in these comments?");
reason_request
.supporting_material()
.set_content(hacker_news_comment_section);
// Can be None
let response: Option<u32> = reason_request.return_optional_primitive().await.unwrap();
assert!(res > Some(9000));
// string from a list of options outcome
let mut reason_request = llm_client.reason().exact_string();
reason_request
.instructions()
.set_content("Based on this readme, what is the name of the creator of this project?");
reason_request
.supporting_material()
.set_content(llm_client_readme);
reason_request
.primitive
.add_strings_to_allowed(&["shelby", "jack", "camacho", "john"]);
let response: String = reason_request.return_primitive().await.unwrap();
assert_eq!(res, "shelby");
See the reason example for more
§Decisions with N number of Votes Across a Temperature Gradient
Uses the same process as above N number of times where N is the number of times the process must be repeated to reach a consensus. We dynamically alter the temperature to ensure an accurate consensus. See the workflow for implementation details.
- Supports primitives that implement the reasoning trait
- The consensus vote count can be set with
best_of_n_votes()
- By default
dynamic_temperture
is enabled, and each ‘vote’ increases across a gradient
// An integer decision request
let decision_request = llm_client.reason().integer().decision();
decision_request.best_of_n_votes(5);
decision_request
.instructions()
.set_content("How many fingers do you have?");
let response = decision_request.return_primitive().await.unwrap();
assert_eq!(response, 5);
See the decision example for more
§Structured Outputs and NLP
- Data extraction, summarization, and semantic splitting on text
- Currently implemented NLP workflows are url extraction
§Basic Primitives
A generation where the output is constrained to one of the defined primitive types. See the currently implemented primitive types. These are used in other workflows, but only some are used as the output for specific workflows like reason and decision.
- These are fairly easy to add, so feel free to open an issue if you’d like one added
See the basic_primitive example
§API LLMs
- Basic support for API based LLMs. Currently, anthropic, openai, perplexity
- Perplexity does not currently return documents, but it does create its responses from live data
let llm_client = LlmClient::perplexity().sonar_large().init();
let mut basic_completion = llm_client.basic_completion();
basic_completion
.prompt()
.add_user_message()
.set_content("Can you help me use the llm_client rust crate? I'm having trouble getting cuda to work.");
let response = basic_completion.run().await?;
See the basic_completion example
§Configuring Requests
- All requests and workflows implement the
RequestConfigTrait
which gives access to the parameters sent to the LLM - These settings are normalized across both local and API requests
let llm_client = LlmClient::llama_cpp()
.available_vram(48)
.mistral7b_instruct_v0_3()
.init()
.await?;
let basic_completion = llm_client.basic_completion();
basic_completion
.temperature(1.5)
.frequency_penalty(0.9)
.max_tokens(200);
Re-exports§
pub use components::InstructPromptTrait;
pub use primitives::PrimitiveTrait;
pub use workflows::reason::decision::DecisionTrait;
pub use workflows::reason::ReasonTrait;
pub use llm_interface;
Modules§
Structs§
- ApiPrompt
- A prompt formatter for API-based language models that follow OpenAI’s message format.
- Completion
Request - Completion
Response - CpuConfig
- Configuration for managing CPU resources in LLM inference workloads.
- Cuda
Config - Configuration for NVIDIA CUDA devices on Linux and Windows platforms.
- Device
Config - Configuration for hardware devices used in LLM inference.
- Generation
Settings - The settings used to generate the completion.
- Inference
Probabilities - The log probability of the completion.
- Llama
CppLogit Bias - LlmClient
- LlmPrompt
- A prompt management system that supports both API-based LLMs (like OpenAI) and local LLMs.
- Local
Prompt - A prompt formatter for local LLMs that use chat templates.
- Logging
Config - Configuration for the logging system.
- Logit
Bias - MaxToken
State - Open
AiLogit Bias - Prompt
Message - An individual message within a prompt sequence.
- Prompt
Messages - A collection of prompt messages with thread-safe mutability.
- Request
Config - Stop
Sequences - Timing
Usage - Timing statistics for the completion request.
- Token
Usage - Token statistics for the completion request.
- TopProbabilities
Enums§
- Completion
Error - Completion
Finish Reason - Prompt
Message Type - Represents the type of message in a prompt sequence.
- Request
Token Limit Error - Stopping
Sequence - Text
Concatenator - Controls how text segments are joined together in prompt messages.
Traits§
- Anthropic
Model Trait - Gguf
Preset Trait - LlmLocal
Trait - Logging
Config Trait - Trait for configuring logging behavior.
- Logit
Bias Trait - Open
AiModel Trait - Perplexity
Model Trait - Prompt
Tokenizer - A trait for tokenizers that can be used with the prompt management system.
- Request
Config Trait - Text
Concatenator Trait - Provides methods for managing text concatenation behavior.
Functions§
- apply_
chat_ template - Applies a chat template to a message, given a message and a chat template.
- build_
repo - Clones and builds a repository at a specified tag with appropriate platform-specific optimizations.
- check_
and_ get_ max_ tokens - Sets and validates the ‘max_tokens’ or ‘n_ctx’ or ‘n_predict’ parameter for a request. First, it checks that the total_prompt_tokens is less than the ctx_size - safety_tokens. Then returns ‘available_tokens’ as the lower of either: ctx_size - total_prompt_tokens - safety_tokens or if it’s provided, inference_ctx_size. If ‘requested_tokens’ is provided, ‘requested_tokens’ is returned if less than ‘available_tokens’. If ‘requested_tokens’ is ‘None’ or ‘requested_tokens’ is greater than ‘available_tokens’, ‘available_tokens’ is returned.
- get_
target_ directory - Resolves the Cargo target directory path.
- i_ln
- Writes an indented line without newline.
- i_lns
- Writes multiple indented lines without newlines.
- i_nln
- Writes an indented line with newline.
- i_nlns
- Writes multiple indented lines with newlines.
- init_
nvml_ wrapper - Initializes NVIDIA Management Library (NVML).