llm_api_access
The llm_api_access crate provides a unified way to interact with different large language models (LLMs) like OpenAI, Gemini, Anthropic, and local Llama servers.
Current Status
This crate is used to power an open-source coding assistant currently in active development. Gemini has been the main test target; OpenAI (including embeddings), Anthropic, and Llama Server are supported. Recent updates include unified support for "thinking" or "reasoning" blocks from models like OpenAI's o1/o3, Anthropic's Claude 3.7, and Google's Gemini 2.0 Flash Thinking. Development is self-encouraged so updates can be far and few between, open an issue on github if you want something specific.
Unified Response Structure
To support models that output both a thought process and a final answer, responses from the text generation methods are returned as an LlmResponse:
LLM Enum
This enum represents the supported LLM providers:
OpenAI: Represents the OpenAI language models.Gemini: Represents the Gemini language models.Anthropic: Represents the Anthropic language models.LlamaServer: Represents a local or remote Llama-compatible server.
Access Trait
The Access trait defines asynchronous methods for interacting with LLMs:
send_single_message: Sends a single message and returns the generated structured response.async ;send_convo_message: Sends a list of messages as a conversation and returns the generated structured response.async ;get_model_info: Gets information about a specific LLM model.async ;list_models: Lists all available LLM models.async ;count_tokens: Counts the number of tokens in a given text.async ;
The LLM enum implements Access, providing specific implementations for each method based on the chosen LLM provider.
Note: Currently, get_model_info, list_models, and count_tokens only work for the Gemini LLM. Other providers return an error indicating this functionality is not yet supported.
LlmConfig
The LlmConfig struct allows you to configure provider-specific settings for the LLM calls. It uses a builder pattern for easy customization.
Thinking Budgets & Reasoning:
Passing a thinking_budget automatically configures the underlying provider (like Anthropic) to return reasoning tokens before the final text answer. These reasoning tokens will be populated in the reasoning field of the returned LlmResponse.
Example Usage:
use LlmConfig;
// Default usage (no config)
let config = None;
// With thinking budget (Enables reasoning blocks on compatible models)
let config = Some;
// With Google Search grounding enabled for Gemini
let config = Some;
// Universal parameters
let config = Some;
Loading API Credentials with dotenv
The llm_api_access crate uses the dotenv library to securely load API credentials from a .env file in your project's root directory. This file should contain key-value pairs for each LLM provider you want to use.
Example Structure:
OPEN_AI_ORG=your_openai_org
OPEN_AI_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
LLAMA_SERVER_URL=http://127.0.0.1:8080
Example Usage
send_single_message Example
use ;
use LlmConfig;
async
send_convo_message Example
use ;
use Message;
use LlmConfig;
async
Embeddings
The crate provides support for generating text embeddings through the OpenAI API.
OpenAI Embeddings
The openai module includes functionality to generate vector embeddings:
pub async
This function takes:
input: The text to generate embeddings fordimensions: Optional parameter to specify the number of dimensions (if omitted, uses the model default)
It returns a vector of floating point values representing the text embedding.
Example Usage:
use get_embedding;
async
The function uses the "text-embedding-3-small" model by default and requires the same environment variables as other OpenAI API calls (OPEN_AI_KEY and OPEN_AI_ORG).
Testing
The llm_api_access crate includes unit tests for various methods in the Access trait. To run the tests, use: