LLM Kit Groq
Groq provider for LLM Kit - Ultra-fast LLM inference with open-source models powered by Groq's LPU architecture.
Note: This provider uses the standardized builder pattern. See the Quick Start section for the recommended usage.
Features
- Text Generation: Generate text with Llama, Gemma, DeepSeek, Qwen, and other open-source models
- Streaming: Real-time response streaming with ultra-fast inference speeds
- Tool Calling: Support for function calling with chat models
- Speech Generation: Text-to-speech with PlayAI models
- Transcription: Audio-to-text with Whisper models (large-v3, large-v3-turbo, distilled)
- Ultra-Fast Inference: Groq's LPU architecture delivers industry-leading inference speeds
- Provider Metadata: Access cached token counts and performance metrics
Installation
Add this to your Cargo.toml:
[]
= "0.1"
= "0.1"
= "0.1"
= { = "1", = ["full"] }
Quick Start
Using the Client Builder (Recommended)
use GroqClient;
use LanguageModel;
async
Using Settings Directly (Alternative)
use ;
use LanguageModel;
async
Configuration
Environment Variables
Set your Groq API key as an environment variable:
# Optional
Using the Client Builder
use GroqClient;
let provider = new
.api_key
.base_url
.header
.name
.build;
Using Settings Directly
use ;
let settings = new
.with_api_key
.with_base_url
.add_header
.with_name;
let provider = new;
Builder Methods
The GroqClient builder supports:
.api_key(key)- Set the API key (overridesGROQ_API_KEYenvironment variable).load_api_key_from_env()- Explicitly load API key fromGROQ_API_KEYenvironment variable.base_url(url)- Set custom base URL (default:https://api.groq.com/openai/v1).name(name)- Set provider name (optional).header(key, value)- Add a single custom header.headers(map)- Add multiple custom headers.build()- Build the provider
Supported Models
All Groq models are supported across multiple model families.
Chat Models
- Llama:
llama-3.1-8b-instant,llama-3.3-70b-versatile,meta-llama/llama-guard-4-12b,meta-llama/llama-4-maverick-17b-128e-instruct,meta-llama/llama-4-scout-17b-16e-instruct - Gemma:
gemma2-9b-it - DeepSeek:
deepseek-r1-distill-llama-70b - Qwen:
qwen/qwen3-32b - Mixtral:
mixtral-8x7b-32768
Transcription Models (Whisper)
whisper-large-v3- Most accurate Whisper modelwhisper-large-v3-turbo- Faster Whisper variant with lower latencydistil-whisper-large-v3-en- English-optimized distilled model for faster performance
Text-to-Speech Models
playai-tts- PlayAI text-to-speech model
For a complete list of available models, see the Groq Models documentation.
Provider-Specific Options
Groq supports advanced features through provider options.
Reasoning Format
Control how reasoning content is returned:
use GenerateText;
use json;
let result = new
.provider_options
.execute
.await?;
Reasoning Effort
Configure computational effort for reasoning models:
use GenerateText;
use json;
let result = new
.provider_options
.execute
.await?;
Parallel Tool Calls
Enable or disable parallel tool execution (default: true):
use GenerateText;
use json;
let result = new
.tools
.provider_options
.execute
.await?;
Service Tier
Select the service tier for processing:
use GenerateText;
use json;
let result = new
.provider_options
.execute
.await?;
Cached Tokens Metadata
Groq provides metadata about cached tokens to help optimize performance:
use GroqClient;
use ;
let provider = new.build;
let model = provider.chat_model;
let result = new
.execute
.await?;
// Access cache metadata
if let Some = result.provider_metadata
Available Provider Options
| Option | Type | Description |
|---|---|---|
reasoningFormat |
string |
Reasoning content format: "parsed", "raw", "hidden" |
reasoningEffort |
string |
Computational effort level for reasoning |
parallelToolCalls |
bool |
Enable parallel tool execution (default: true) |
structuredOutputs |
bool |
Enable structured outputs (default: true) |
serviceTier |
string |
Service tier: "on_demand", "flex", "auto" |
user |
string |
End-user identifier for abuse monitoring |
Examples
See the examples/ directory for complete examples:
chat.rs- Basic chat completion usingdo_generate()directlystream.rs- Streaming responses usingdo_stream()directlychat_tool_calling.rs- Tool calling usingdo_generate()directlystream_tool_calling.rs- Streaming with tools usingdo_stream()directlyspeech_generation.rs- Text-to-speech usingdo_generate()directlytranscription.rs- Audio transcription usingdo_transcribe()directly
Run examples with:
Documentation
License
Licensed under:
- MIT license (LICENSE-MIT)
Contributing
Contributions are welcome! Please see the Contributing Guide for more details.