pub struct CompletionRequest {Show 35 fields
pub model: String,
pub hosting: Option<Hosting>,
pub prompt: Prompt,
pub maximum_tokens: u32,
pub minimum_tokens: Option<u32>,
pub echo: Option<bool>,
pub temperature: Option<f64>,
pub top_k: Option<u32>,
pub top_p: Option<f64>,
pub presence_penalty: Option<f64>,
pub frequency_penalty: Option<f64>,
pub sequence_penalty: Option<f64>,
pub sequence_penalty_min_length: Option<i32>,
pub repetition_penalties_include_prompt: Option<bool>,
pub repetition_penalties_include_completion: Option<bool>,
pub use_multiplicative_presence_penalty: Option<bool>,
pub use_multiplicative_frequency_penalty: Option<bool>,
pub use_multiplicative_sequence_penalty: Option<bool>,
pub penalty_exceptions: Option<Vec<String>>,
pub penalty_bias: Option<String>,
pub penalty_exceptions_include_stop_sequences: Option<bool>,
pub best_of: Option<u32>,
pub n: Option<u32>,
pub log_probs: Option<i32>,
pub stop_sequences: Option<Vec<String>>,
pub tokens: Option<bool>,
pub raw_completion: Option<bool>,
pub disable_optimizations: Option<bool>,
pub completion_bias_inclusion: Option<Vec<String>>,
pub completion_bias_inclusion_first_token_only: Option<bool>,
pub completion_bias_exclusion: Option<Vec<String>>,
pub completion_bias_exclusion_first_token_only: Option<bool>,
pub contextual_control_threshold: Option<f64>,
pub control_log_additive: Option<bool>,
pub logit_bias: Option<HashMap<i32, f32>>,
}
Fields§
§model: String
The name of the model from the Luminous model family, e.g. luminous-base"
.
Models and their respective architectures can differ in parameter size and capabilities.
The most recent version of the model is always used. The model output contains information
as to the model version.
hosting: Option<Hosting>
Determines in which datacenters the request may be processed. You can either set the parameter to “aleph-alpha” or omit it (defaulting to None).
Not setting this value, or setting it to None, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability.
Setting it to “aleph-alpha” allows us to only process the request in our own datacenters. Choose this option for maximal data privacy.
prompt: Prompt
Prompt to complete. The modalities supported depend on model
.
maximum_tokens: u32
Limits the number of tokens, which are generated for the completion.
minimum_tokens: Option<u32>
Generate at least this number of tokens before an end-of-text token is generated. (default: 0)
echo: Option<bool>
Echo the prompt in the completion. This may be especially helpful when log_probs is set to return logprobs for the prompt.
temperature: Option<f64>
List of strings which will stop generation if they are generated. Stop sequences are helpful in structured texts. E.g.: In a question answering scenario a text may consist of lines starting with either “Question: “ or “Answer: “ (alternating). After producing an answer, the model will be likely to generate “Question: “. “Question: “ may therefore be used as stop sequence in order not to have the model generate more questions but rather restrict text generation to the answers. A higher sampling temperature encourages the model to produce less probable outputs (“be more creative”). Values are expected in a range from 0.0 to 1.0. Try high values (e.g., 0.9) for a more “creative” response and the default 0.0 for a well defined and repeatable answer. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.
top_k: Option<u32>
Introduces random sampling for generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.
top_p: Option<f64>
Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last.
presence_penalty: Option<f64>
The presence penalty reduces the likelihood of generating tokens that are already present in the
generated text (repetition_penalties_include_completion=true
) respectively the prompt
(repetition_penalties_include_prompt=true
).
Presence penalty is independent of the number of occurrences. Increase the value to reduce the likelihood of repeating
text.
An operation like the following is applied: logits[t] -> logits[t] - 1 * penalty
where logits[t]
is the logits for any given token. Note that the formula is independent of the number of times
that a token appears.
frequency_penalty: Option<f64>
The frequency penalty reduces the likelihood of generating tokens that are already present in the
generated text (repetition_penalties_include_completion=true
) respectively the prompt
(repetition_penalties_include_prompt=true
).
If repetition_penalties_include_prompt=True
, this also includes the tokens in the prompt.
Frequency penalty is dependent on the number of occurrences of a token.
An operation like the following is applied: logits[t] -> logits[t] - count[t] * penalty
where logits[t]
is the logits for any given token and count[t]
is the number of times that token appears.
sequence_penalty: Option<f64>
Increasing the sequence penalty reduces the likelihood of reproducing token sequences that already appear in the prompt (if repetition_penalties_include_prompt is True) and prior completion.
sequence_penalty_min_length: Option<i32>
Minimal number of tokens to be considered as sequence
repetition_penalties_include_prompt: Option<bool>
Flag deciding whether presence penalty or frequency penalty are updated from tokens in the prompt
repetition_penalties_include_completion: Option<bool>
Flag deciding whether presence penalty or frequency penalty are updated from tokens in the completion
use_multiplicative_presence_penalty: Option<bool>
Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence penalty.
use_multiplicative_frequency_penalty: Option<bool>
Flag deciding whether frequency penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for frequency penalty.
use_multiplicative_sequence_penalty: Option<bool>
Flag deciding whether sequence penalty is applied multiplicatively (True) or additively (False).
penalty_exceptions: Option<Vec<String>>
List of strings that may be generated without penalty, regardless of other penalty settings.
By default, we will also include any stop_sequences
you have set, since completion performance
can be degraded if expected stop sequences are penalized.
You can disable this behavior by setting penalty_exceptions_include_stop_sequences
to false
.
penalty_bias: Option<String>
All tokens in this text will be used in addition to the already penalized tokens for repetition
penalties.
These consist of the already generated completion tokens and the prompt tokens, if
repetition_penalties_include_prompt
is set to true
.
penalty_exceptions_include_stop_sequences: Option<bool>
By default we include all stop_sequences
in penalty_exceptions
, so as not to penalise the
presence of stop sequences that are present in few-shot prompts to give structure to your
completions.
You can set this to false
if you do not want this behaviour.
See the description of penalty_exceptions
for more information on what penalty_exceptions
are
used for.
best_of: Option<u32>
If a value is given, the number of best_of
completions will be generated on the server side. The
completion with the highest log probability per token is returned. If the parameter n
is greater
than 1 more than 1 (n
) completions will be returned. best_of
must be strictly greater than n
.
n: Option<u32>
The number of completions to return. If argmax sampling is used (temperature, top_k, top_p are all default) the same completions will be produced. This parameter should only be increased if random sampling is used.
log_probs: Option<i32>
Number of top log probabilities for each token generated. Log probabilities can be used in downstream tasks or to assess the model’s certainty when producing tokens. No log probabilities are returned if set to None. Log probabilities of generated tokens are returned if set to 0. Log probabilities of generated tokens and top n log probabilities are returned if set to n.
stop_sequences: Option<Vec<String>>
List of strings that will stop generation if they’re generated. Stop sequences may be helpful in structured texts.
tokens: Option<bool>
Flag indicating whether individual tokens of the completion should be returned (True) or whether solely the generated text (i.e. the completion) is sufficient (False).
raw_completion: Option<bool>
Setting this parameter to true forces the raw completion of the model to be returned. For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the CompletionResponse. The raw completion, if returned, will contain the un-optimized completion. Setting tokens to true or log_probs to any value will also trigger the raw completion to be returned.
disable_optimizations: Option<bool>
We continually research optimal ways to work with our models. By default, we apply these
optimizations to both your prompt and completion for you.
Our goal is to improve your results while using our API. But you can always pass
disable_optimizations: true
and we will leave your prompt and completion untouched.
completion_bias_inclusion: Option<Vec<String>>
Bias the completion to only generate options within this list; all other tokens are disregarded at sampling
Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa
completion_bias_inclusion_first_token_only: Option<bool>
Only consider the first token for the completion_bias_inclusion
completion_bias_exclusion: Option<Vec<String>>
Bias the completion to NOT generate options within this list; all other tokens are unaffected in sampling
Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa
completion_bias_exclusion_first_token_only: Option<bool>
Only consider the first token for the completion_bias_exclusion
contextual_control_threshold: Option<f64>
If set to null
, attention control parameters only apply to those tokens that have
explicitly been set in the request.
If set to a non-null value, we apply the control parameters to similar tokens as well.
Controls that have been applied to one token will then be applied to all other tokens
that have at least the similarity score defined by this parameter.
The similarity score is the cosine similarity of token embeddings.
control_log_additive: Option<bool>
true
: apply controls on prompt items by adding the log(control_factor)
to attention scores.
false
: apply controls on prompt items by
(attention_scores - -attention_scores.min(-1)) * control_factor
logit_bias: Option<HashMap<i32, f32>>
The logit bias allows to influence the likelihood of generating tokens. A dictionary mapping token ids (int) to a bias (float) can be provided. Such bias is added to the logits as generated by the model.
Implementations§
Source§impl CompletionRequest
impl CompletionRequest
pub fn minimum_tokens(self, minimum_tokens: u32) -> Self
pub fn echo(self, echo: bool) -> Self
pub fn temperature(self, temperature: f64) -> Self
pub fn top_k(self, top_k: u32) -> Self
pub fn top_p(self, top_p: f64) -> Self
pub fn presence_penalty(self, presence_penalty: f64) -> Self
pub fn frequency_penalty(self, frequency_penalty: f64) -> Self
pub fn sequence_penalty(self, sequence_penalty: f64) -> Self
pub fn sequence_penalty_min_length( self, sequence_penalty_min_length: i32, ) -> Self
pub fn repetition_penalties_include_prompt( self, repetition_penalties_include_prompt: bool, ) -> Self
pub fn repetition_penalties_include_completion( self, repetition_penalties_include_completion: bool, ) -> Self
pub fn use_multiplicative_presence_penalty( self, use_multiplicative_presence_penalty: bool, ) -> Self
pub fn use_multiplicative_frequency_penalty( self, use_multiplicative_frequency_penalty: bool, ) -> Self
pub fn use_multiplicative_sequence_penalty( self, use_multiplicative_sequence_penalty: bool, ) -> Self
pub fn penalty_exceptions(self, penalty_exceptions: Vec<String>) -> Self
pub fn penalty_bias(self, penalty_bias: String) -> Self
pub fn penalty_exceptions_include_stop_sequences( self, penalty_exceptions_include_stop_sequences: bool, ) -> Self
pub fn best_of(self, best_of: u32) -> Self
pub fn n(self, n: u32) -> Self
pub fn log_probs(self, log_probs: i32) -> Self
pub fn stop_sequences(self, stop_sequences: Vec<String>) -> Self
pub fn tokens(self, tokens: bool) -> Self
pub fn raw_completion(self, raw_completion: bool) -> Self
pub fn disable_optimizations(self, disable_optimizations: bool) -> Self
pub fn completion_bias_inclusion( self, completion_bias_inclusion: Vec<String>, ) -> Self
pub fn completion_bias_inclusion_first_token_only( self, completion_bias_inclusion_first_token_only: bool, ) -> Self
pub fn completion_bias_exclusion( self, completion_bias_exclusion: Vec<String>, ) -> Self
pub fn completion_bias_exclusion_first_token_only( self, completion_bias_exclusion_first_token_only: bool, ) -> Self
pub fn contextual_control_threshold( self, contextual_control_threshold: f64, ) -> Self
pub fn control_log_additive(self, control_log_additive: bool) -> Self
pub fn logit_bias(self, logit_bias: HashMap<i32, f32>) -> Self
Trait Implementations§
Source§impl Debug for CompletionRequest
impl Debug for CompletionRequest
Source§impl Default for CompletionRequest
impl Default for CompletionRequest
Source§fn default() -> CompletionRequest
fn default() -> CompletionRequest
Auto Trait Implementations§
impl Freeze for CompletionRequest
impl RefUnwindSafe for CompletionRequest
impl Send for CompletionRequest
impl Sync for CompletionRequest
impl Unpin for CompletionRequest
impl UnwindSafe for CompletionRequest
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<R, P> ReadPrimitive<R> for P
impl<R, P> ReadPrimitive<R> for P
Source§fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
fn read_from_little_endian(read: &mut R) -> Result<Self, Error>
ReadEndian::read_from_little_endian()
.