pub struct RequestConfig {
pub requested_response_tokens: Option<u64>,
pub safety_tokens: u64,
pub temperature: f32,
pub frequency_penalty: Option<f32>,
pub presence_penalty: f32,
pub top_p: Option<f32>,
pub retry_after_fail_n_times: u8,
pub increase_limit_on_fail: bool,
pub cache_prompt: bool,
/* private fields */
}
Fields§
§requested_response_tokens: Option<u64>
Requested maximum number of tokens for the model’s output.
This value specifies the upper limit of tokens the model should generate in its response.
The system uses this value, along with the input prompt length, to ensure the entire request (input + output) stays within the model’s token limits.
- For OpenAI API-compatible LLMs, this corresponds to the ‘max_tokens’ parameter.
- For local LLMs, this is equivalent to the ‘n_predict’ parameter.
If None
, the system will use a default or calculated value based on RequestConfig::model_ctx_size or RequestConfig::inference_ctx_size.
safety_tokens: u64
A small safety margin to prevent exceeding model limits.
This is a count of tokens subtracted from the total available tokens to help ensure that the model doesn’t unexpectedly exceed its token limit. This prevents issues that might arise from slight discrepancies in token counting or unexpected model behavior.
Defaults to 10 tokens.
temperature: f32
Controls the randomness of the model’s output.
The temperature parameter adjusts the randomness in token selection for the model’s response. It accepts values between 0.0 and 2.0:
- Higher values (e.g., 0.8) increase randomness, leading to more diverse and creative outputs.
- Lower values (e.g., 0.2) decrease randomness, resulting in more focused and deterministic responses.
Note: It’s generally recommended to adjust either this parameter or top_p
, but not both simultaneously.
Special considerations:
- For Anthropic models: This value is automatically scaled to the range 0.0 to 1.0 to match the requirements of crate::llms::api::anthropic::completion::AnthropicCompletionRequest::temperature.
Supported by all LLM backends.
Defaults to 1.0
.
frequency_penalty: Option<f32>
Adjusts token selection based on their frequency in the generated text.
The frequency penalty influences how the model selects tokens based on their existing frequency in the output. It accepts values between -2.0 and 2.0:
- Positive values decrease the likelihood of repeating tokens, reducing verbatim repetition.
- Negative values increase the likelihood of repeating tokens, potentially leading to more repetitive text.
- A value of 0.0 (or
None
) applies no frequency-based adjustments.
This can be particularly useful for:
- Encouraging more diverse vocabulary usage (with positive values)
- Maintaining consistent terminology (with negative values)
Supported LLMs: openai
Defaults to None
(no frequency penalty applied).
presence_penalty: f32
Adjusts token selection based on their presence in the generated text.
The presence penalty influences how the model selects tokens based on whether they’ve appeared at all in the output, regardless of frequency. It accepts values between -2.0 and 2.0:
- Positive values decrease the likelihood of using tokens that have appeared at all, encouraging the model to introduce new concepts and topics.
- Negative values increase the likelihood of reusing tokens that have appeared, potentially leading to more focused or repetitive text.
- A value of 0.0 applies no presence-based adjustments.
This differs from frequency_penalty
in that it considers only whether a token has
appeared, not how often.
Use cases:
- Encouraging the model to cover more topics (with positive values)
- Maintaining focus on specific themes (with negative values)
Supported LLMs: openai
Defaults to 0.0
(no presence penalty applied).
top_p: Option<f32>
Controls diversity via nucleus sampling.
Top-p sampling (also called nucleus sampling) is an alternative to temperature-based sampling.
It selects from the smallest possible set of tokens whose cumulative probability exceeds
the probability p
. The value should be between 0.0 and 1.0:
- A value of 0.1 means only the tokens comprising the top 10% probability mass are considered.
- Lower values lead to more focused and deterministic outputs.
- Higher values allow for more diverse outputs.
Key points:
- It’s generally recommended to adjust either this or
temperature
, but not both simultaneously. - This method is considered more advanced than
temperature
and is recommended for users who need fine-grained control over output diversity.
Supported LLMs: All
Defaults to None
(not used, falling back to temperature-based sampling).
retry_after_fail_n_times: u8
Maximum number of retry attempts after a request failure.
Specifies how many times the system should attempt to retry a failed request before giving up. This can help handle transient errors or temporary service unavailability.
Supported LLMs: All
Defaults to 3
.
increase_limit_on_fail: bool
Automatically increase token limit on request failure.
When set to true
, if a request fails due to token limit constraints or other errors,
the system will attempt to increase the token limit using RequestConfig::increase_token_limit
before retrying the request.
Supported LLMs: All
Defaults to false
.
cache_prompt: bool
Enable prompt caching for subsequent requests.
When set to true
, the system will cache the prompt and reuse it for the next request.
This can potentially improve performance for repeated or similar queries.
Supported LLMs
Defaults to false
.
Implementations§
Source§impl RequestConfig
impl RequestConfig
pub const DEFAULT_INCREASE_FACTOR: f32 = 1.33000004f32
pub fn new(model_ctx_size: u64, inference_ctx_size: u64) -> Self
pub fn set_max_tokens_for_request( &mut self, total_prompt_tokens: u64, ) -> Result<(), RequestTokenLimitError>
pub fn increase_token_limit( &mut self, total_prompt_tokens: u64, token_increase_factor: Option<f32>, ) -> Result<(), RequestTokenLimitError>
Trait Implementations§
Source§impl Clone for RequestConfig
impl Clone for RequestConfig
Source§fn clone(&self) -> RequestConfig
fn clone(&self) -> RequestConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreAuto Trait Implementations§
impl Freeze for RequestConfig
impl RefUnwindSafe for RequestConfig
impl Send for RequestConfig
impl Sync for RequestConfig
impl Unpin for RequestConfig
impl UnwindSafe for RequestConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<T> ToCompactString for Twhere
T: Display,
impl<T> ToCompactString for Twhere
T: Display,
Source§fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>
fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>
ToCompactString::to_compact_string()
Read moreSource§fn to_compact_string(&self) -> CompactString
fn to_compact_string(&self) -> CompactString
CompactString
. Read moreSource§impl<T> ToStringFallible for Twhere
T: Display,
impl<T> ToStringFallible for Twhere
T: Display,
Source§fn try_to_string(&self) -> Result<String, TryReserveError>
fn try_to_string(&self) -> Result<String, TryReserveError>
ToString::to_string
, but without panic on OOM.