Struct RequestConfig

Source
pub struct RequestConfig {
    pub requested_response_tokens: Option<u64>,
    pub safety_tokens: u64,
    pub temperature: f32,
    pub frequency_penalty: Option<f32>,
    pub presence_penalty: f32,
    pub top_p: Option<f32>,
    pub retry_after_fail_n_times: u8,
    pub increase_limit_on_fail: bool,
    pub cache_prompt: bool,
    /* private fields */
}

Fields§

§requested_response_tokens: Option<u64>

Requested maximum number of tokens for the model’s output.

This value specifies the upper limit of tokens the model should generate in its response.

The system uses this value, along with the input prompt length, to ensure the entire request (input + output) stays within the model’s token limits.

  • For OpenAI API-compatible LLMs, this corresponds to the ‘max_tokens’ parameter.
  • For local LLMs, this is equivalent to the ‘n_predict’ parameter.

If None, the system will use a default or calculated value based on RequestConfig::model_ctx_size or RequestConfig::inference_ctx_size.

§safety_tokens: u64

A small safety margin to prevent exceeding model limits.

This is a count of tokens subtracted from the total available tokens to help ensure that the model doesn’t unexpectedly exceed its token limit. This prevents issues that might arise from slight discrepancies in token counting or unexpected model behavior.

Defaults to 10 tokens.

§temperature: f32

Controls the randomness of the model’s output.

The temperature parameter adjusts the randomness in token selection for the model’s response. It accepts values between 0.0 and 2.0:

  • Higher values (e.g., 0.8) increase randomness, leading to more diverse and creative outputs.
  • Lower values (e.g., 0.2) decrease randomness, resulting in more focused and deterministic responses.

Note: It’s generally recommended to adjust either this parameter or top_p, but not both simultaneously.

Special considerations:

Supported by all LLM backends.

Defaults to 1.0.

§frequency_penalty: Option<f32>

Adjusts token selection based on their frequency in the generated text.

The frequency penalty influences how the model selects tokens based on their existing frequency in the output. It accepts values between -2.0 and 2.0:

  • Positive values decrease the likelihood of repeating tokens, reducing verbatim repetition.
  • Negative values increase the likelihood of repeating tokens, potentially leading to more repetitive text.
  • A value of 0.0 (or None) applies no frequency-based adjustments.

This can be particularly useful for:

  • Encouraging more diverse vocabulary usage (with positive values)
  • Maintaining consistent terminology (with negative values)

Supported LLMs: openai

Defaults to None (no frequency penalty applied).

§presence_penalty: f32

Adjusts token selection based on their presence in the generated text.

The presence penalty influences how the model selects tokens based on whether they’ve appeared at all in the output, regardless of frequency. It accepts values between -2.0 and 2.0:

  • Positive values decrease the likelihood of using tokens that have appeared at all, encouraging the model to introduce new concepts and topics.
  • Negative values increase the likelihood of reusing tokens that have appeared, potentially leading to more focused or repetitive text.
  • A value of 0.0 applies no presence-based adjustments.

This differs from frequency_penalty in that it considers only whether a token has appeared, not how often.

Use cases:

  • Encouraging the model to cover more topics (with positive values)
  • Maintaining focus on specific themes (with negative values)

Supported LLMs: openai

Defaults to 0.0 (no presence penalty applied).

§top_p: Option<f32>

Controls diversity via nucleus sampling.

Top-p sampling (also called nucleus sampling) is an alternative to temperature-based sampling. It selects from the smallest possible set of tokens whose cumulative probability exceeds the probability p. The value should be between 0.0 and 1.0:

  • A value of 0.1 means only the tokens comprising the top 10% probability mass are considered.
  • Lower values lead to more focused and deterministic outputs.
  • Higher values allow for more diverse outputs.

Key points:

  • It’s generally recommended to adjust either this or temperature, but not both simultaneously.
  • This method is considered more advanced than temperature and is recommended for users who need fine-grained control over output diversity.

Supported LLMs: All

Defaults to None (not used, falling back to temperature-based sampling).

§retry_after_fail_n_times: u8

Maximum number of retry attempts after a request failure.

Specifies how many times the system should attempt to retry a failed request before giving up. This can help handle transient errors or temporary service unavailability.

Supported LLMs: All

Defaults to 3.

§increase_limit_on_fail: bool

Automatically increase token limit on request failure.

When set to true, if a request fails due to token limit constraints or other errors, the system will attempt to increase the token limit using RequestConfig::increase_token_limit before retrying the request.

Supported LLMs: All

Defaults to false.

§cache_prompt: bool

Enable prompt caching for subsequent requests.

When set to true, the system will cache the prompt and reuse it for the next request. This can potentially improve performance for repeated or similar queries.

Supported LLMs

Defaults to false.

Implementations§

Source§

impl RequestConfig

Source

pub const DEFAULT_INCREASE_FACTOR: f32 = 1.33000004f32

Source

pub fn new(model_ctx_size: u64, inference_ctx_size: u64) -> Self

Source

pub fn set_max_tokens_for_request( &mut self, total_prompt_tokens: u64, ) -> Result<(), RequestTokenLimitError>

Source

pub fn increase_token_limit( &mut self, total_prompt_tokens: u64, token_increase_factor: Option<f32>, ) -> Result<(), RequestTokenLimitError>

Trait Implementations§

Source§

impl Clone for RequestConfig

Source§

fn clone(&self) -> RequestConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Display for RequestConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToCompactString for T
where T: Display,

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T> ToStringFallible for T
where T: Display,

Source§

fn try_to_string(&self) -> Result<String, TryReserveError>

ToString::to_string, but without panic on OOM.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,