llm_api_access 0.1.33

# llm_api_access

The `llm_api_access` crate provides a unified way to interact with different large language models (LLMs) like OpenAI, Gemini, Anthropic, and local Llama servers. 

## Current Status

This crate is used to power an open-source coding assistant currently in active development. Gemini has been the main test target; OpenAI (including embeddings), Anthropic, and Llama Server are supported. Recent updates include unified support for "thinking" or "reasoning" blocks from models like OpenAI's `o1`/`o3`, Anthropic's Claude 3.7, and Google's Gemini 2.0 Flash Thinking. Development is self-encouraged so updates can be far and few between, open an issue on github if you want something specific.

### Unified Response Structure

To support models that output both a thought process and a final answer, responses from the text generation methods are returned as an `LlmResponse`:

```rust
pub struct LlmResponse {
    pub text: String,
    pub reasoning: Option<String>,
}
```

### LLM Enum

This enum represents the supported LLM providers:

- `OpenAI`: Represents the OpenAI language models.
- `Gemini`: Represents the Gemini language models.
- `Anthropic`: Represents the Anthropic language models.
- `LlamaServer`: Represents a local or remote Llama-compatible server.

### Access Trait

The `Access` trait defines asynchronous methods for interacting with LLMs:

- `send_single_message`: Sends a single message and returns the generated structured response.
  ```rust
  async fn send_single_message(
        &self,
        message: &str,
        model: Option<&str>,
        config: Option<&LlmConfig>,
    ) -> Result<LlmResponse, Box<dyn std::error::Error + Send + Sync>>;
  ```
- `send_convo_message`: Sends a list of messages as a conversation and returns the generated structured response.
  ```rust
  async fn send_convo_message(
        &self,
        messages: Vec<Message>,
        model: Option<&str>,
        config: Option<&LlmConfig>,
    ) -> Result<LlmResponse, Box<dyn std::error::Error + Send + Sync>>;
  ```
- `get_model_info`: Gets information about a specific LLM model.
  ```rust
  async fn get_model_info(
        &self,
        model: &str,
    ) -> Result<ModelInfo, Box<dyn std::error::Error + Send + Sync>>;
  ```
- `list_models`: Lists all available LLM models.
  ```rust
  async fn list_models(&self)
        -> Result<Vec<ModelInfo>, Box<dyn std::error::Error + Send + Sync>>;
  ```
- `count_tokens`: Counts the number of tokens in a given text.
  ```rust
  async fn count_tokens(
        &self,
        text: &str,
        model: &str,
    ) -> Result<u32, Box<dyn std::error::Error + Send + Sync>>;
  ```

The `LLM` enum implements `Access`, providing specific implementations for each method based on the chosen LLM provider.

**Note:** Currently, `get_model_info`, `list_models`, and `count_tokens` only work for the Gemini LLM. Other providers return an error indicating this functionality is not yet supported.

### LlmConfig

The `LlmConfig` struct allows you to configure provider-specific settings for the LLM calls. It uses a builder pattern for easy customization.

```rust
#[derive(Debug, Clone, Default)]
pub struct LlmConfig {
    pub temperature: Option<f64>,
    pub thinking_budget: Option<i32>,
    pub grounding_with_search: Option<bool>, // Enable grounding with Google Search for Gemini
    pub stream: Option<bool>,
    pub max_tokens: Option<u32>,       
    pub stop: Option<Vec<String>,
    pub cache_prompt: Option<bool>,
    pub json_schema: Option<serde_json::Value>,    
    pub top_k: Option<u32>,
    pub top_p: Option<f32>,
}
```
**Thinking Budgets & Reasoning:**
Passing a `thinking_budget` automatically configures the underlying provider (like Anthropic) to return reasoning tokens before the final text answer. These reasoning tokens will be populated in the `reasoning` field of the returned `LlmResponse`. 

**Example Usage:**

```rust
use llm_api_access::config::LlmConfig;

// Default usage (no config)
let config = None;

// With thinking budget (Enables reasoning blocks on compatible models)
let config = Some(LlmConfig::new().with_thinking_budget(1024));

// With Google Search grounding enabled for Gemini
let config = Some(LlmConfig::new().with_grounding_with_search(true));

// Universal parameters
let config = Some(LlmConfig::new()
    .with_temperature(0.7)
    .with_max_tokens(2048));
```

### Loading API Credentials with dotenv

The `llm_api_access` crate uses the `dotenv` library to securely load API credentials from a `.env` file in your project's root directory. This file should contain key-value pairs for each LLM provider you want to use.

**Example Structure:**

```
OPEN_AI_ORG=your_openai_org
OPEN_AI_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
LLAMA_SERVER_URL=http://127.0.0.1:8080
```

## Example Usage

### `send_single_message` Example

```rust
use llm_api_access::llm::{Access, LLM};
use llm_api_access::config::LlmConfig; 

#[tokio::main]
async fn main() {
    // Create an instance of the OpenAI LLM
    let llm = LLM::OpenAI;

    // Send a single message to the LLM
    let response = llm.send_single_message("Tell me a joke about programmers", None, None).await;

    match response {
        Ok(res) => println!("Joke: {}", res.text),
        Err(err) => eprintln!("Error: {}", err),
    }

    // Send a message asking for reasoning to a thinking model
    let config = Some(LlmConfig::new().with_thinking_budget(1024));
    let response = llm.send_single_message("Calculate how many ping pong balls fit in a bus.", Some("o3-mini"), config.as_ref()).await;

    match response {
        Ok(res) => {
            if let Some(reasoning) = res.reasoning {
                println!("Thought Process:\n{}", reasoning);
            }
            println!("Final Answer:\n{}", res.text);
        },
        Err(err) => eprintln!("Error: {}", err),
    }
}
```

### `send_convo_message` Example

```rust
use llm_api_access::llm::{Access, LLM};
use llm_api_access::structs::general::Message;
use llm_api_access::config::LlmConfig; 

#[tokio::main]
async fn main() {
    // Create an instance of the Gemini LLM
    let llm = LLM::Gemini;

    // Define the conversation messages
    let messages = vec![
        Message {
            role: "user".to_string(),
            content: "You are a helpful coding assistant.".into(),
        },
        Message {
            role: "model".to_string(),
            content: "You got it! I am ready to assist!".into(),
        },
        Message {
            role: "user".to_string(),
            content: "Generate a rust function that reverses a string.".into(),
        },
    ];

    // Send the conversation messages
    let response = llm.send_convo_message(messages.clone(), None, None).await;

    match response {
        Ok(res) => println!("Code: {}", res.text),
        Err(err) => eprintln!("Error: {}", err),
    }
}
```

## Embeddings

The crate provides support for generating text embeddings through the OpenAI API.

### OpenAI Embeddings

The `openai` module includes functionality to generate vector embeddings:

```rust
pub async fn get_embedding(
    input: String,
    dimensions: Option<u32>,
) -> Result<Vec<f32>, Box<dyn std::error::Error + Send + Sync>>
```

This function takes:
- `input`: The text to generate embeddings for
- `dimensions`: Optional parameter to specify the number of dimensions (if omitted, uses the model default)

It returns a vector of floating point values representing the text embedding.

### Example Usage:

```rust
use llm_api_access::openai::get_embedding;

#[tokio::main]
async fn main() {
    // Generate an embedding with default dimensions
    match get_embedding("This is a sample text for embedding".to_string(), None).await {
        Ok(embedding) => {
            println!("Generated embedding with {} dimensions", embedding.len());
            // Use embedding for semantic search, clustering, etc.
        },
        Err(err) => eprintln!("Error generating embedding: {}", err),
    }
    
    // Generate an embedding with custom dimensions
    match get_embedding("Custom dimension embedding".to_string(), Some(64)).await {
        Ok(embedding) => {
            println!("Generated custom embedding with {} dimensions", embedding.len());
            assert_eq!(embedding.len(), 64);
        },
        Err(err) => eprintln!("Error generating embedding: {}", err),
    }
}
```

The function uses the "text-embedding-3-small" model by default and requires the same environment variables as other OpenAI API calls (`OPEN_AI_KEY` and `OPEN_AI_ORG`).

## Testing

The `llm_api_access` crate includes unit tests for various methods in the `Access` trait.  To run the tests, use:

```bash
cargo test
```