alith_interface::requests::req_components

Struct RequestConfig

pub struct RequestConfig {
    pub requested_response_tokens: Option<u64>,
    pub safety_tokens: u64,
    pub temperature: f32,
    pub frequency_penalty: Option<f32>,
    pub presence_penalty: f32,
    pub top_p: Option<f32>,
    pub retry_after_fail_n_times: u8,
    pub increase_limit_on_fail: bool,
    pub cache_prompt: bool,
    /* private fields */
}

Fields§

§requested_response_tokens: Option<u64>

Requested maximum number of tokens for the model’s output.

This value specifies the upper limit of tokens the model should generate in its response.

The system uses this value, along with the input prompt length, to ensure the entire request (input + output) stays within the model’s token limits.

For OpenAI API-compatible LLMs, this corresponds to the ‘max_tokens’ parameter.
For local LLMs, this is equivalent to the ‘n_predict’ parameter.

If None, the system will use a default or calculated value based on RequestConfig::model_ctx_size or RequestConfig::inference_ctx_size.

§safety_tokens: u64

A small safety margin to prevent exceeding model limits.

This is a count of tokens subtracted from the total available tokens to help ensure that the model doesn’t unexpectedly exceed its token limit. This prevents issues that might arise from slight discrepancies in token counting or unexpected model behavior.

Defaults to 10 tokens.

§temperature: f32

Controls the randomness of the model’s output.

The temperature parameter adjusts the randomness in token selection for the model’s response. It accepts values between 0.0 and 2.0:

Higher values (e.g., 0.8) increase randomness, leading to more diverse and creative outputs.
Lower values (e.g., 0.2) decrease randomness, resulting in more focused and deterministic responses.

Note: It’s generally recommended to adjust either this parameter or top_p, but not both simultaneously.

Special considerations:

For Anthropic models: This value is automatically scaled to the range 0.0 to 1.0 to match the requirements of crate::llms::api::anthropic::completion::AnthropicCompletionRequest::temperature.

Supported by all LLM backends.

Defaults to 1.0.

§frequency_penalty: Option<f32>

Adjusts token selection based on their frequency in the generated text.

The frequency penalty influences how the model selects tokens based on their existing frequency in the output. It accepts values between -2.0 and 2.0:

Positive values decrease the likelihood of repeating tokens, reducing verbatim repetition.
Negative values increase the likelihood of repeating tokens, potentially leading to more repetitive text.
A value of 0.0 (or None) applies no frequency-based adjustments.

This can be particularly useful for:

Encouraging more diverse vocabulary usage (with positive values)
Maintaining consistent terminology (with negative values)

Supported LLMs: openai

Defaults to None (no frequency penalty applied).

§presence_penalty: f32

Adjusts token selection based on their presence in the generated text.

The presence penalty influences how the model selects tokens based on whether they’ve appeared at all in the output, regardless of frequency. It accepts values between -2.0 and 2.0:

Positive values decrease the likelihood of using tokens that have appeared at all, encouraging the model to introduce new concepts and topics.
Negative values increase the likelihood of reusing tokens that have appeared, potentially leading to more focused or repetitive text.
A value of 0.0 applies no presence-based adjustments.

This differs from frequency_penalty in that it considers only whether a token has appeared, not how often.

Use cases:

Encouraging the model to cover more topics (with positive values)
Maintaining focus on specific themes (with negative values)

Supported LLMs: openai

Defaults to 0.0 (no presence penalty applied).

§top_p: Option<f32>

Controls diversity via nucleus sampling.

Top-p sampling (also called nucleus sampling) is an alternative to temperature-based sampling. It selects from the smallest possible set of tokens whose cumulative probability exceeds the probability p. The value should be between 0.0 and 1.0:

A value of 0.1 means only the tokens comprising the top 10% probability mass are considered.
Lower values lead to more focused and deterministic outputs.
Higher values allow for more diverse outputs.

Key points:

It’s generally recommended to adjust either this or temperature, but not both simultaneously.
This method is considered more advanced than temperature and is recommended for users who need fine-grained control over output diversity.

Supported LLMs: All

Defaults to None (not used, falling back to temperature-based sampling).

§retry_after_fail_n_times: u8

Maximum number of retry attempts after a request failure.

Specifies how many times the system should attempt to retry a failed request before giving up. This can help handle transient errors or temporary service unavailability.

Supported LLMs: All

Defaults to 3.

§increase_limit_on_fail: bool

Automatically increase token limit on request failure.

When set to true, if a request fails due to token limit constraints or other errors, the system will attempt to increase the token limit using RequestConfig::increase_token_limit before retrying the request.

Supported LLMs: All

Defaults to false.

§cache_prompt: bool

Enable prompt caching for subsequent requests.

When set to true, the system will cache the prompt and reuse it for the next request. This can potentially improve performance for repeated or similar queries.

Supported LLMs

Defaults to false.

Struct RequestConfigCopy item path

Fields§

Implementations§

impl RequestConfig

pub const DEFAULT_INCREASE_FACTOR: f32 = 1.33000004f32

pub fn new(model_ctx_size: u64, inference_ctx_size: u64) -> Self

pub fn set_max_tokens_for_request( &mut self, total_prompt_tokens: u64, ) -> Result<(), RequestTokenLimitError>

pub fn increase_token_limit( &mut self, total_prompt_tokens: u64, token_increase_factor: Option<f32>, ) -> Result<(), RequestTokenLimitError>

Trait Implementations§

impl Clone for RequestConfig

fn clone(&self) -> RequestConfig

fn clone_from(&mut self, source: &Self)

impl Display for RequestConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for RequestConfig

impl RefUnwindSafe for RequestConfig

impl Send for RequestConfig

impl Sync for RequestConfig

impl Unpin for RequestConfig

impl UnwindSafe for RequestConfig

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToCompactString for Twhere T: Display,

fn try_to_compact_string(&self) -> Result<CompactString, ToCompactStringError>

fn to_compact_string(&self) -> CompactString

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T> ToString for Twhere T: Display + ?Sized,

fn to_string(&self) -> String

impl<T> ToStringFallible for Twhere T: Display,

fn try_to_string(&self) -> Result<String, TryReserveError>

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

impl<T> ErasedDestructor for Twhere T: 'static,

Struct RequestConfig

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToCompactString for T
where T: Display,

impl<T> ToOwned for T
where T: Clone,

impl<T> ToString for T
where T: Display + ?Sized,

impl<T> ToStringFallible for T
where T: Display,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> ErasedDestructor for T
where T: 'static,