llm_api_access 0.1.33

A package to query popular LLMs
Documentation

llm_api_access

The llm_api_access crate provides a unified way to interact with different large language models (LLMs) like OpenAI, Gemini, Anthropic, and local Llama servers.

Current Status

This crate is used to power an open-source coding assistant currently in active development. Gemini has been the main test target; OpenAI (including embeddings), Anthropic, and Llama Server are supported. Recent updates include unified support for "thinking" or "reasoning" blocks from models like OpenAI's o1/o3, Anthropic's Claude 3.7, and Google's Gemini 2.0 Flash Thinking. Development is self-encouraged so updates can be far and few between, open an issue on github if you want something specific.

Unified Response Structure

To support models that output both a thought process and a final answer, responses from the text generation methods are returned as an LlmResponse:

pub struct LlmResponse {
    pub text: String,
    pub reasoning: Option<String>,
}

LLM Enum

This enum represents the supported LLM providers:

  • OpenAI: Represents the OpenAI language models.
  • Gemini: Represents the Gemini language models.
  • Anthropic: Represents the Anthropic language models.
  • LlamaServer: Represents a local or remote Llama-compatible server.

Access Trait

The Access trait defines asynchronous methods for interacting with LLMs:

  • send_single_message: Sends a single message and returns the generated structured response.
    async fn send_single_message(
          &self,
          message: &str,
          model: Option<&str>,
          config: Option<&LlmConfig>,
      ) -> Result<LlmResponse, Box<dyn std::error::Error + Send + Sync>>;
    
  • send_convo_message: Sends a list of messages as a conversation and returns the generated structured response.
    async fn send_convo_message(
          &self,
          messages: Vec<Message>,
          model: Option<&str>,
          config: Option<&LlmConfig>,
      ) -> Result<LlmResponse, Box<dyn std::error::Error + Send + Sync>>;
    
  • get_model_info: Gets information about a specific LLM model.
    async fn get_model_info(
          &self,
          model: &str,
      ) -> Result<ModelInfo, Box<dyn std::error::Error + Send + Sync>>;
    
  • list_models: Lists all available LLM models.
    async fn list_models(&self)
          -> Result<Vec<ModelInfo>, Box<dyn std::error::Error + Send + Sync>>;
    
  • count_tokens: Counts the number of tokens in a given text.
    async fn count_tokens(
          &self,
          text: &str,
          model: &str,
      ) -> Result<u32, Box<dyn std::error::Error + Send + Sync>>;
    

The LLM enum implements Access, providing specific implementations for each method based on the chosen LLM provider.

Note: Currently, get_model_info, list_models, and count_tokens only work for the Gemini LLM. Other providers return an error indicating this functionality is not yet supported.

LlmConfig

The LlmConfig struct allows you to configure provider-specific settings for the LLM calls. It uses a builder pattern for easy customization.

#[derive(Debug, Clone, Default)]
pub struct LlmConfig {
    pub temperature: Option<f64>,
    pub thinking_budget: Option<i32>,
    pub grounding_with_search: Option<bool>, // Enable grounding with Google Search for Gemini
    pub stream: Option<bool>,
    pub max_tokens: Option<u32>,       
    pub stop: Option<Vec<String>,
    pub cache_prompt: Option<bool>,
    pub json_schema: Option<serde_json::Value>,    
    pub top_k: Option<u32>,
    pub top_p: Option<f32>,
}

Thinking Budgets & Reasoning: Passing a thinking_budget automatically configures the underlying provider (like Anthropic) to return reasoning tokens before the final text answer. These reasoning tokens will be populated in the reasoning field of the returned LlmResponse.

Example Usage:

use llm_api_access::config::LlmConfig;

// Default usage (no config)
let config = None;

// With thinking budget (Enables reasoning blocks on compatible models)
let config = Some(LlmConfig::new().with_thinking_budget(1024));

// With Google Search grounding enabled for Gemini
let config = Some(LlmConfig::new().with_grounding_with_search(true));

// Universal parameters
let config = Some(LlmConfig::new()
    .with_temperature(0.7)
    .with_max_tokens(2048));

Loading API Credentials with dotenv

The llm_api_access crate uses the dotenv library to securely load API credentials from a .env file in your project's root directory. This file should contain key-value pairs for each LLM provider you want to use.

Example Structure:

OPEN_AI_ORG=your_openai_org
OPEN_AI_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
LLAMA_SERVER_URL=http://127.0.0.1:8080

Example Usage

send_single_message Example

use llm_api_access::llm::{Access, LLM};
use llm_api_access::config::LlmConfig; 

#[tokio::main]
async fn main() {
    // Create an instance of the OpenAI LLM
    let llm = LLM::OpenAI;

    // Send a single message to the LLM
    let response = llm.send_single_message("Tell me a joke about programmers", None, None).await;

    match response {
        Ok(res) => println!("Joke: {}", res.text),
        Err(err) => eprintln!("Error: {}", err),
    }

    // Send a message asking for reasoning to a thinking model
    let config = Some(LlmConfig::new().with_thinking_budget(1024));
    let response = llm.send_single_message("Calculate how many ping pong balls fit in a bus.", Some("o3-mini"), config.as_ref()).await;

    match response {
        Ok(res) => {
            if let Some(reasoning) = res.reasoning {
                println!("Thought Process:\n{}", reasoning);
            }
            println!("Final Answer:\n{}", res.text);
        },
        Err(err) => eprintln!("Error: {}", err),
    }
}

send_convo_message Example

use llm_api_access::llm::{Access, LLM};
use llm_api_access::structs::general::Message;
use llm_api_access::config::LlmConfig; 

#[tokio::main]
async fn main() {
    // Create an instance of the Gemini LLM
    let llm = LLM::Gemini;

    // Define the conversation messages
    let messages = vec![
        Message {
            role: "user".to_string(),
            content: "You are a helpful coding assistant.".into(),
        },
        Message {
            role: "model".to_string(),
            content: "You got it! I am ready to assist!".into(),
        },
        Message {
            role: "user".to_string(),
            content: "Generate a rust function that reverses a string.".into(),
        },
    ];

    // Send the conversation messages
    let response = llm.send_convo_message(messages.clone(), None, None).await;

    match response {
        Ok(res) => println!("Code: {}", res.text),
        Err(err) => eprintln!("Error: {}", err),
    }
}

Embeddings

The crate provides support for generating text embeddings through the OpenAI API.

OpenAI Embeddings

The openai module includes functionality to generate vector embeddings:

pub async fn get_embedding(
    input: String,
    dimensions: Option<u32>,
) -> Result<Vec<f32>, Box<dyn std::error::Error + Send + Sync>>

This function takes:

  • input: The text to generate embeddings for
  • dimensions: Optional parameter to specify the number of dimensions (if omitted, uses the model default)

It returns a vector of floating point values representing the text embedding.

Example Usage:

use llm_api_access::openai::get_embedding;

#[tokio::main]
async fn main() {
    // Generate an embedding with default dimensions
    match get_embedding("This is a sample text for embedding".to_string(), None).await {
        Ok(embedding) => {
            println!("Generated embedding with {} dimensions", embedding.len());
            // Use embedding for semantic search, clustering, etc.
        },
        Err(err) => eprintln!("Error generating embedding: {}", err),
    }
    
    // Generate an embedding with custom dimensions
    match get_embedding("Custom dimension embedding".to_string(), Some(64)).await {
        Ok(embedding) => {
            println!("Generated custom embedding with {} dimensions", embedding.len());
            assert_eq!(embedding.len(), 64);
        },
        Err(err) => eprintln!("Error generating embedding: {}", err),
    }
}

The function uses the "text-embedding-3-small" model by default and requires the same environment variables as other OpenAI API calls (OPEN_AI_KEY and OPEN_AI_ORG).

Testing

The llm_api_access crate includes unit tests for various methods in the Access trait. To run the tests, use:

cargo test