Struct CompletionRequest

Source

pub struct CompletionRequest {Show 35 fields
    pub model: String,
    pub hosting: Option<Hosting>,
    pub prompt: Prompt,
    pub maximum_tokens: u32,
    pub minimum_tokens: Option<u32>,
    pub echo: Option<bool>,
    pub temperature: Option<f64>,
    pub top_k: Option<u32>,
    pub top_p: Option<f64>,
    pub presence_penalty: Option<f64>,
    pub frequency_penalty: Option<f64>,
    pub sequence_penalty: Option<f64>,
    pub sequence_penalty_min_length: Option<i32>,
    pub repetition_penalties_include_prompt: Option<bool>,
    pub repetition_penalties_include_completion: Option<bool>,
    pub use_multiplicative_presence_penalty: Option<bool>,
    pub use_multiplicative_frequency_penalty: Option<bool>,
    pub use_multiplicative_sequence_penalty: Option<bool>,
    pub penalty_exceptions: Option<Vec<String>>,
    pub penalty_bias: Option<String>,
    pub penalty_exceptions_include_stop_sequences: Option<bool>,
    pub best_of: Option<u32>,
    pub n: Option<u32>,
    pub log_probs: Option<i32>,
    pub stop_sequences: Option<Vec<String>>,
    pub tokens: Option<bool>,
    pub raw_completion: Option<bool>,
    pub disable_optimizations: Option<bool>,
    pub completion_bias_inclusion: Option<Vec<String>>,
    pub completion_bias_inclusion_first_token_only: Option<bool>,
    pub completion_bias_exclusion: Option<Vec<String>>,
    pub completion_bias_exclusion_first_token_only: Option<bool>,
    pub contextual_control_threshold: Option<f64>,
    pub control_log_additive: Option<bool>,
    pub logit_bias: Option<HashMap<i32, f32>>,
}

Fields§

§model: String

The name of the model from the Luminous model family, e.g. luminous-base". Models and their respective architectures can differ in parameter size and capabilities. The most recent version of the model is always used. The model output contains information as to the model version.

§hosting: Option<Hosting>

Determines in which datacenters the request may be processed. You can either set the parameter to “aleph-alpha” or omit it (defaulting to None).

Not setting this value, or setting it to None, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.

Setting it to “aleph-alpha” allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.

§prompt: Prompt

Prompt to complete. The modalities supported depend on model.

§maximum_tokens: u32

Limits the number of tokens, which are generated for the completion.

§minimum_tokens: Option<u32>

Generate at least this number of tokens before an end-of-text token is generated. (default: 0)

§echo: Option<bool>

Echo the prompt in the completion. This may be especially helpful when log_probs is set to return logprobs for the prompt.

§temperature: Option<f64>

List of strings which will stop generation if they are generated. Stop sequences are helpful in structured texts. E.g.: In a question answering scenario a text may consist of lines starting with either “Question: “ or “Answer: “ (alternating). After producing an answer, the model will be likely to generate “Question: “. “Question: “ may therefore be used as stop sequence in order not to have the model generate more questions but rather restrict text generation to the answers. A higher sampling temperature encourages the model to produce less probable outputs (“be more creative”). Values are expected in a range from 0.0 to 1.0. Try high values (e.g., 0.9) for a more “creative” response and the default 0.0 for a well defined and repeatable answer. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

§top_k: Option<u32>

Introduces random sampling for generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

§top_p: Option<f64>

Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.

§presence_penalty: Option<f64>

The presence penalty reduces the likelihood of generating tokens that are already present in the generated text (repetition_penalties_include_completion=true) respectively the prompt (repetition_penalties_include_prompt=true). Presence penalty is independent of the number of occurrences. Increase the value to reduce the likelihood of repeating text. An operation like the following is applied: logits[t] -> logits[t] - 1 * penalty where logits[t] is the logits for any given token. Note that the formula is independent of the number of times that a token appears.

§frequency_penalty: Option<f64>

The frequency penalty reduces the likelihood of generating tokens that are already present in the generated text (repetition_penalties_include_completion=true) respectively the prompt (repetition_penalties_include_prompt=true). If repetition_penalties_include_prompt=True, this also includes the tokens in the prompt. Frequency penalty is dependent on the number of occurrences of a token. An operation like the following is applied: logits[t] -> logits[t] - count[t] * penalty where logits[t] is the logits for any given token and count[t] is the number of times that token appears.

§sequence_penalty: Option<f64>

Increasing the sequence penalty reduces the likelihood of reproducing token sequences that already appear in the prompt (if repetition_penalties_include_prompt is True) and prior completion.

§sequence_penalty_min_length: Option<i32>

Minimal number of tokens to be considered as sequence

§repetition_penalties_include_prompt: Option<bool>

Flag deciding whether presence penalty or frequency penalty are updated from tokens in the prompt

§repetition_penalties_include_completion: Option<bool>

Flag deciding whether presence penalty or frequency penalty are updated from tokens in the completion

§use_multiplicative_presence_penalty: Option<bool>

Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence penalty.

§use_multiplicative_frequency_penalty: Option<bool>

Flag deciding whether frequency penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for frequency penalty.

§use_multiplicative_sequence_penalty: Option<bool>

Flag deciding whether sequence penalty is applied multiplicatively (True) or additively (False).

§penalty_exceptions: Option<Vec<String>>

List of strings that may be generated without penalty, regardless of other penalty settings. By default, we will also include any stop_sequences you have set, since completion performance can be degraded if expected stop sequences are penalized. You can disable this behavior by setting penalty_exceptions_include_stop_sequences to false.

§penalty_bias: Option<String>

All tokens in this text will be used in addition to the already penalized tokens for repetition penalties. These consist of the already generated completion tokens and the prompt tokens, if repetition_penalties_include_prompt is set to true.

§penalty_exceptions_include_stop_sequences: Option<bool>

By default we include all stop_sequences in penalty_exceptions, so as not to penalise the presence of stop sequences that are present in few-shot prompts to give structure to your completions.

You can set this to false if you do not want this behaviour.

See the description of penalty_exceptions for more information on what penalty_exceptions are used for.

§best_of: Option<u32>

If a value is given, the number of best_of completions will be generated on the server side. The completion with the highest log probability per token is returned. If the parameter n is greater than 1 more than 1 (n) completions will be returned. best_of must be strictly greater than n.

§n: Option<u32>

The number of completions to return. If argmax sampling is used (temperature, top_k, top_p are all default) the same completions will be produced. This parameter should only be increased if random sampling is used.

§log_probs: Option<i32>

Number of top log probabilities for each token generated. Log probabilities can be used in downstream tasks or to assess the model’s certainty when producing tokens. No log probabilities are returned if set to None. Log probabilities of generated tokens are returned if set to 0. Log probabilities of generated tokens and top n log probabilities are returned if set to n.

§stop_sequences: Option<Vec<String>>

List of strings that will stop generation if they’re generated. Stop sequences may be helpful in structured texts.

§tokens: Option<bool>

Flag indicating whether individual tokens of the completion should be returned (True) or whether solely the generated text (i.e. the completion) is sufficient (False).

§raw_completion: Option<bool>

Setting this parameter to true forces the raw completion of the model to be returned. For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the CompletionResponse. The raw completion, if returned, will contain the un-optimized completion. Setting tokens to true or log_probs to any value will also trigger the raw completion to be returned.

§disable_optimizations: Option<bool>

We continually research optimal ways to work with our models. By default, we apply these optimizations to both your prompt and completion for you. Our goal is to improve your results while using our API. But you can always pass disable_optimizations: true and we will leave your prompt and completion untouched.

§completion_bias_inclusion: Option<Vec<String>>

Bias the completion to only generate options within this list; all other tokens are disregarded at sampling

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa

§completion_bias_inclusion_first_token_only: Option<bool>

Only consider the first token for the completion_bias_inclusion

§completion_bias_exclusion: Option<Vec<String>>

Bias the completion to NOT generate options within this list; all other tokens are unaffected in sampling

Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa

§completion_bias_exclusion_first_token_only: Option<bool>

Only consider the first token for the completion_bias_exclusion

§contextual_control_threshold: Option<f64>

If set to null, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings.

§control_log_additive: Option<bool>

true: apply controls on prompt items by adding the log(control_factor) to attention scores. false: apply controls on prompt items by (attention_scores - -attention_scores.min(-1)) * control_factor

§logit_bias: Option<HashMap<i32, f32>>

The logit bias allows to influence the likelihood of generating tokens. A dictionary mapping token ids (int) to a bias (float) can be provided. Such bias is added to the logits as generated by the model.

Struct CompletionRequestCopy item path

Fields§

Implementations§

impl CompletionRequest

pub fn new(model: String, prompt: Prompt, maximum_tokens: u32) -> Self

pub fn from_text(model: String, prompt: String, maximum_tokens: u32) -> Self

impl CompletionRequest

pub fn minimum_tokens(self, minimum_tokens: u32) -> Self

pub fn echo(self, echo: bool) -> Self

pub fn temperature(self, temperature: f64) -> Self

pub fn top_k(self, top_k: u32) -> Self

pub fn top_p(self, top_p: f64) -> Self

pub fn presence_penalty(self, presence_penalty: f64) -> Self

pub fn frequency_penalty(self, frequency_penalty: f64) -> Self

pub fn sequence_penalty(self, sequence_penalty: f64) -> Self

pub fn sequence_penalty_min_length( self, sequence_penalty_min_length: i32, ) -> Self

pub fn repetition_penalties_include_prompt( self, repetition_penalties_include_prompt: bool, ) -> Self

pub fn repetition_penalties_include_completion( self, repetition_penalties_include_completion: bool, ) -> Self

pub fn use_multiplicative_presence_penalty( self, use_multiplicative_presence_penalty: bool, ) -> Self

pub fn use_multiplicative_frequency_penalty( self, use_multiplicative_frequency_penalty: bool, ) -> Self

pub fn use_multiplicative_sequence_penalty( self, use_multiplicative_sequence_penalty: bool, ) -> Self

pub fn penalty_exceptions(self, penalty_exceptions: Vec<String>) -> Self

pub fn penalty_bias(self, penalty_bias: String) -> Self

pub fn penalty_exceptions_include_stop_sequences( self, penalty_exceptions_include_stop_sequences: bool, ) -> Self

pub fn best_of(self, best_of: u32) -> Self

pub fn n(self, n: u32) -> Self

pub fn log_probs(self, log_probs: i32) -> Self

pub fn stop_sequences(self, stop_sequences: Vec<String>) -> Self

pub fn tokens(self, tokens: bool) -> Self

pub fn raw_completion(self, raw_completion: bool) -> Self

pub fn disable_optimizations(self, disable_optimizations: bool) -> Self

pub fn completion_bias_inclusion( self, completion_bias_inclusion: Vec<String>, ) -> Self

pub fn completion_bias_inclusion_first_token_only( self, completion_bias_inclusion_first_token_only: bool, ) -> Self

pub fn completion_bias_exclusion( self, completion_bias_exclusion: Vec<String>, ) -> Self

pub fn completion_bias_exclusion_first_token_only( self, completion_bias_exclusion_first_token_only: bool, ) -> Self

pub fn contextual_control_threshold( self, contextual_control_threshold: f64, ) -> Self

pub fn control_log_additive(self, control_log_additive: bool) -> Self

pub fn logit_bias(self, logit_bias: HashMap<i32, f32>) -> Self

Trait Implementations§

impl Debug for CompletionRequest

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for CompletionRequest

fn default() -> CompletionRequest

impl Serialize for CompletionRequest

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

Auto Trait Implementations§

impl Freeze for CompletionRequest

impl RefUnwindSafe for CompletionRequest

impl Send for CompletionRequest

impl Sync for CompletionRequest

impl Unpin for CompletionRequest

impl UnwindSafe for CompletionRequest

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<R, P> ReadPrimitive<R> for Pwhere R: Read + ReadEndian<P>, P: Default,

fn read_from_little_endian(read: &mut R) -> Result<Self, Error>

fn read_from_big_endian(read: &mut R) -> Result<Self, Error>

fn read_from_native_endian(read: &mut R) -> Result<Self, Error>

Struct CompletionRequest

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<R, P> ReadPrimitive<R> for P
where R: Read + ReadEndian<P>, P: Default,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> ErasedDestructor for T
where T: 'static,