llm_client/lib.rs
1//! # llm_client: The Easiest Rust Interface for Local LLMs
2//! [](https://docs.rs/llm_client)
3//!
4//! The llm_client crate is a workspace member of the [llm_client](https://github.com/ShelbyJenkins/llm_client) project.
5//!
6//!
7//! Add to your Cargo.toml:
8//! ```toml
9//! # For Mac (CPU and GPU), windows (CPU and CUDA), or linux (CPU and CUDA)
10//! llm_client="*"
11//! ```
12//!
13//! This will download and build [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md).
14//! See [build.md](../docs/build.md) for other features and backends like mistral.rs.
15//!
16//! ```rust
17//! use Llmclient::prelude::*;
18//! let llm_client = LlmClient::llama_cpp()
19//! .mistral7b_instruct_v0_3() // Uses a preset model
20//! .init() // Downloads model from hugging face and starts the inference interface
21//! .await?;
22//! ```
23//!
24//! Several of the most common models are available as presets. Loading from local models is also fully supported.
25//! See [models.md](./docs/models.md) for more information.
26//!
27//! # An Interface for Deterministic Signals from Probabilistic LLM Vibes
28//!
29//! ## Reasoning with Primitive Outcomes
30//!
31//! A constraint enforced CoT process for reasoning. First, we get the LLM to 'justify' an answer in plain english.
32//! This allows the LLM to 'think' by outputting the stream of tokens required to come to an answer. Then we take
33//! that 'justification', and prompt the LLM to parse it for the answer.
34//! See [the workflow for implementation details](./src/workflows/reason/one_round.rs).
35//!
36//! - Currently supporting returning booleans, u32s, and strings from a list of options
37//! - Can be 'None' when ran with `return_optional_primitive()`
38//!
39//! ```rust
40//! // boolean outcome
41//! let reason_request = llm_client.reason().boolean();
42//! reason_request
43//! .instructions()
44//! .set_content("Does this email subject indicate that the email is spam?");
45//! reason_request
46//! .supporting_material()
47//! .set_content("You'll never believe these low, low prices 💲💲💲!!!");
48//! let res: bool = reason_request.return_primitive().await.unwrap();
49//! assert_eq!(res, true);
50//!
51//! // u32 outcome
52//! let reason_request = llm_client.reason().integer();
53//! reason_request.primitive.lower_bound(0).upper_bound(10000);
54//! reason_request
55//! .instructions()
56//! .set_content("How many times is the word 'llm' mentioned in these comments?");
57//! reason_request
58//! .supporting_material()
59//! .set_content(hacker_news_comment_section);
60//! // Can be None
61//! let response: Option<u32> = reason_request.return_optional_primitive().await.unwrap();
62//! assert!(res > Some(9000));
63//!
64//! // string from a list of options outcome
65//! let mut reason_request = llm_client.reason().exact_string();
66//! reason_request
67//! .instructions()
68//! .set_content("Based on this readme, what is the name of the creator of this project?");
69//! reason_request
70//! .supporting_material()
71//! .set_content(llm_client_readme);
72//! reason_request
73//! .primitive
74//! .add_strings_to_allowed(&["shelby", "jack", "camacho", "john"]);
75//! let response: String = reason_request.return_primitive().await.unwrap();
76//! assert_eq!(res, "shelby");
77//! ```
78//!
79//! See [the reason example for more](./examples/reason.rs)
80//!
81//! ## Decisions with N number of Votes Across a Temperature Gradient
82//!
83//! Uses the same process as above N number of times where N is the number of times the process must be repeated
84//! to reach a consensus. We dynamically alter the temperature to ensure an accurate consensus.
85//! See [the workflow for implementation details](./src/workflows/reason/decision.rs).
86//!
87//! - Supports primitives that implement the reasoning trait
88//! - The consensus vote count can be set with `best_of_n_votes()`
89//! - By default `dynamic_temperture` is enabled, and each 'vote' increases across a gradient
90//!
91//! ```rust
92//! // An integer decision request
93//! let decision_request = llm_client.reason().integer().decision();
94//! decision_request.best_of_n_votes(5);
95//! decision_request
96//! .instructions()
97//! .set_content("How many fingers do you have?");
98//! let response = decision_request.return_primitive().await.unwrap();
99//! assert_eq!(response, 5);
100//! ```
101//!
102//! See [the decision example for more](./examples/decision.rs)
103//!
104//! ## Structured Outputs and NLP
105//!
106//! - Data extraction, summarization, and semantic splitting on text
107//! - Currently implemented NLP workflows are url extraction
108//!
109//! See [the extract_urls example](./examples/extract_urls.rs)
110//!
111//! ## Basic Primitives
112//!
113//! A generation where the output is constrained to one of the defined primitive types.
114//! See [the currently implemented primitive types](./src/primitives/mod.rs).
115//! These are used in other workflows, but only some are used as the output for specific workflows like reason and decision.
116//!
117//! - These are fairly easy to add, so feel free to open an issue if you'd like one added
118//!
119//! See [the basic_primitive example](./examples/basic_primitive.rs)
120//!
121//! ## API LLMs
122//!
123//! - Basic support for API based LLMs. Currently, anthropic, openai, perplexity
124//! - Perplexity does not *currently* return documents, but it does create its responses from live data
125//!
126//! ```rust
127//! let llm_client = LlmClient::perplexity().sonar_large().init();
128//! let mut basic_completion = llm_client.basic_completion();
129//! basic_completion
130//! .prompt()
131//! .add_user_message()
132//! .set_content("Can you help me use the llm_client rust crate? I'm having trouble getting cuda to work.");
133//! let response = basic_completion.run().await?;
134//! ```
135//!
136//! See [the basic_completion example](./examples/basic_completion.rs)
137//!
138//! ## Configuring Requests
139//!
140//! - All requests and workflows implement the `RequestConfigTrait` which gives access to the parameters sent to the LLM
141//! - These settings are normalized across both local and API requests
142//!
143//! ```rust
144//! let llm_client = LlmClient::llama_cpp()
145//! .available_vram(48)
146//! .mistral7b_instruct_v0_3()
147//! .init()
148//! .await?;
149//!
150//! let basic_completion = llm_client.basic_completion();
151//!
152//! basic_completion
153//! .temperature(1.5)
154//! .frequency_penalty(0.9)
155//! .max_tokens(200);
156//! ```
157//!
158//! See [See all the settings here](../llm_interface/src/requests/req_components.rs)
159
160// Public modules
161pub mod backend_builders;
162pub mod basic_completion;
163pub mod components;
164pub mod primitives;
165pub mod workflows;
166
167// Internal imports
168#[allow(unused_imports)]
169use anyhow::{anyhow, bail, Error, Result};
170#[allow(unused_imports)]
171use tracing::{debug, error, info, span, trace, warn, Level};
172
173// Public exports
174pub use components::InstructPromptTrait;
175pub use llm_devices::*;
176pub use llm_interface;
177pub use llm_interface::llms::local::LlmLocalTrait;
178pub use llm_interface::requests::*;
179pub use llm_models::api_model::{
180 anthropic::AnthropicModelTrait, openai::OpenAiModelTrait, perplexity::PerplexityModelTrait,
181};
182pub use llm_models::local_model::gguf::preset::GgufPresetTrait;
183pub use llm_prompt::LlmPrompt;
184pub use llm_prompt::*;
185pub use primitives::PrimitiveTrait;
186pub use workflows::reason::{decision::DecisionTrait, ReasonTrait};
187
188pub struct LlmClient {
189 pub backend: std::sync::Arc<llm_interface::llms::LlmBackend>,
190}
191
192impl LlmClient {
193 pub fn new(backend: std::sync::Arc<llm_interface::llms::LlmBackend>) -> Self {
194 println!(
195 "{}",
196 colorful::Colorful::bold(colorful::Colorful::color(
197 "Llm Client Ready",
198 colorful::RGB::new(94, 244, 39)
199 ))
200 );
201 Self { backend }
202 }
203 #[cfg(feature = "llama_cpp_backend")]
204 /// Creates a new instance of the [`LlamaCppBackendBuilder`]. This builder that allows you to specify the model and other parameters. It is converted to an `LlmClient` instance using the `init` method.
205 pub fn llama_cpp() -> backend_builders::llama_cpp::LlamaCppBackendBuilder {
206 backend_builders::llama_cpp::LlamaCppBackendBuilder::default()
207 }
208
209 #[cfg(feature = "mistral_rs_backend")]
210 /// Creates a new instance of the [`MistralRsBackendBuilder`] This builder that allows you to specify the model and other parameters. It is converted to an `LlmClient` instance using the `init` method.
211 pub fn mistral_rs() -> backend_builders::mistral_rs::MistralRsBackendBuilder {
212 backend_builders::mistral_rs::MistralRsBackendBuilder::default()
213 }
214
215 /// Creates a new instance of the [`OpenAiBackendBuilder`]. This builder that allows you to specify the model and other parameters. It is converted to an `LlmClient` instance using the `init` method.
216 pub fn openai() -> backend_builders::openai::OpenAiBackendBuilder {
217 backend_builders::openai::OpenAiBackendBuilder::default()
218 }
219
220 /// Creates a new instance of the [`AnthropicBackendBuilder`]. This builder that allows you to specify the model and other parameters. It is converted to an `LlmClient` instance using the `init` method.
221 pub fn anthropic() -> backend_builders::anthropic::AnthropicBackendBuilder {
222 backend_builders::anthropic::AnthropicBackendBuilder::default()
223 }
224
225 /// Creates a new instance of the [`PerplexityBackendBuilder`]. This builder that allows you to specify the model and other parameters. It is converted to an `LlmClient` instance using the `init` method.
226 pub fn perplexity() -> backend_builders::perplexity::PerplexityBackendBuilder {
227 backend_builders::perplexity::PerplexityBackendBuilder::default()
228 }
229
230 pub fn basic_completion(&self) -> basic_completion::BasicCompletion {
231 basic_completion::BasicCompletion::new(self.backend.clone())
232 }
233
234 pub fn basic_primitive(&self) -> workflows::basic_primitive::BasicPrimitiveWorkflowBuilder {
235 workflows::basic_primitive::BasicPrimitiveWorkflowBuilder::new(self.backend.clone())
236 }
237
238 pub fn reason(&self) -> workflows::reason::ReasonWorkflowBuilder {
239 workflows::reason::ReasonWorkflowBuilder::new(self.backend.clone())
240 }
241
242 pub fn nlp(&self) -> workflows::nlp::Nlp {
243 workflows::nlp::Nlp::new(self.backend.clone())
244 }
245
246 pub fn shutdown(&self) {
247 self.backend.shutdown();
248 }
249
250 pub fn base_request(&self) -> llm_interface::requests::CompletionRequest {
251 llm_interface::requests::CompletionRequest::new(self.backend.clone())
252 }
253}