vllora 0.1.23 - Docs.rs

- model: o1-mini
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: o1-mini
    endpoint: null
  price:
    per_input_token: 3.0
    per_output_token: 12.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: The o1 series of large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. Faster and cheaper reasoning model particularly good at coding, math, and science
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
- model: o1-preview
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: o1-preview
    endpoint: null
  price:
    per_input_token: 15.0
    per_output_token: 7.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: The o1 series of large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning model designed to solve hard problems across domains
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
- model: gpt-4o
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: gpt-4o
    endpoint: null
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: High-intelligence flagship model for complex, multi-step tasks. GPT-4o is cheaper and faster than GPT-4 Turbo. It is multimodal (accepting text or image inputs and outputting text), and it has the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. Additionally, GPT-4o has the best vision and performance across non-English languages of any of our models. GPT-4o is available in the OpenAI API to paying customers.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o-mini
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: gpt-4o-mini
    endpoint: null
  price:
    per_input_token: 0.15
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: GPT-4o mini (o for omni) is a fast, affordable small model for focused tasks. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is ideal for fine-tuning, and model outputs from a larger model like GPT-4o can be distilled to GPT-4o-mini to produce similar results at lower cost and latency.The knowledge cutoff for GPT-4o-mini models is October, 2023.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-5-sonnet-20240620
  model_provider: anthropic
  inference_provider:
    provider: anthropic
    model_name: claude-3-5-sonnet-20240620
    endpoint: null
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: Claude most intelligent model. Highest level of intelligence and capability
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-1.5-pro-latest
  model_provider: gemini
  inference_provider:
    provider: gemini
    model_name: gemini-1.5-pro-latest
    endpoint: null
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  - audio
  - video
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 2000000
  description: lightweight model, smaller and faster, lower price + higher rate limits + Lower latency on small prompts (compared to 1.5 Flash)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-flash-exp
  model_provider: google
  inference_provider:
    provider: gemini
    model_name: gemini-2.0-flash-exp
    endpoint: null
  price:
    per_input_token: 2.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  - image
  - audio
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1048576
  description: Next generation features, speed, and multimodal generation for a diverse variety of tasks
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama3-2-3b-instruct-v1.0
  model_provider: meta
  inference_provider:
    provider: bedrock
    model_name: llama3-2-3b-instruct-v1.0
    endpoint: null
  price:
    per_input_token: 0.15
    per_output_token: 0.15
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: Text-only lightweight model built to deliver highly accurate and relevant results. Designed for applications requiring low-latency inferencing and limited computational resources. Ideal for query and prompt rewriting, mobile AI-powered writing assistants, and customer service applications, particularly on edge devices where its efficiency and low latency enable seamless integration into various applications, including mobile AI-powered writing assistants and customer service chatbots.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-7b-instruct-v0.2
  model_provider: mistralai
  inference_provider:
    provider: bedrock
    model_name: mistral-7b-instruct-v0.2
    endpoint: null
  price:
    per_input_token: 0.15
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: A 7B dense Transformer, fast-deployed and easily customizable. Small, yet powerful for a variety of use cases.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-2
  model_provider: xai
  inference_provider:
    provider: xai
    model_name: grok-2
    endpoint: https://api.x.ai/v1
  price:
    per_input_token: 2.0
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: Grok-2 is an advanced AI model developed by xAI, designed to provide highly accurate and helpful responses to a wide range of questions, often with a unique perspective on humanity.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-2-vision-1212
  model_provider: xai
  inference_provider:
    provider: xai
    model_name: grok-2-vision-1212
    endpoint: https://api.x.ai/v1
  price:
    per_input_token: 2.0
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: Grok-2-vision-1212 is an advanced AI model developed by xAI that integrates multimodal capabilities, allowing it to process and understand both text and visual inputs to provide more comprehensive and context-aware responses
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-reasoner
  model_provider: deepseek
  inference_provider:
    provider: deepseek
    model_name: deepseek-reasoner
    endpoint: https://api.deepseek.com/v1
  price:
    per_input_token: 0.55
    per_output_token: 2.19
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 64000
  description: DeepSeek-Reasoner is an advanced AI model designed to enhance logical reasoning and problem-solving capabilities, leveraging deep learning techniques to provide accurate and contextually relevant insights across various domains.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-chat
  model_provider: deepseek
  inference_provider:
    provider: deepseek
    model_name: deepseek-chat
    endpoint: https://api.deepseek.com/v1
  price:
    per_input_token: 0.14
    per_output_token: 0.28
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 64000
  description: DeepSeek-Chat is an advanced conversational AI model designed to provide intelligent
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: dall-e-2
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: dall-e-2
    endpoint: null
  price:
    type_prices:
      standard:
        512x512: 0.018
        1024x1024: 0.02
        256x256: 0.016
    mp_price: 1.23
    valid_from: null
  input_formats:
  - text
  output_formats:
  - image
  capabilities: []
  type: image_generation
  limits:
    max_context_size: 0
  description: DALL·E 2 is an advanced AI model by OpenAI that generates high-quality images from text descriptions, allowing for creative visualizations and edits of images based on user prompts.
  parameters: null
- model: dall-e-3
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: dall-e-3
    endpoint: null
  price:
    type_prices:
      hd:
        1792x1024: 0.12
        1024x1024: 0.08
        1024x1792: 0.12
      standard:
        1024x1024: 0.04
        1024x1792: 0.08
        1792x1024: 0.08
    mp_price: null
    valid_from: null
  input_formats:
  - text
  output_formats:
  - image
  capabilities: []
  type: image_generation
  limits:
    max_context_size: 0
  description: DALL·E 3 is the latest iteration of OpenAI's image generation model, offering even more accurate, detailed, and creative image creation from text prompts, with improved coherence and understanding of complex requests.
  parameters: null
- model: gpt-3.5-turbo
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: gpt-3.5-turbo-0125
    endpoint: null
  price:
    per_input_token: 0.5
    per_output_token: 1.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16385
  description: The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: text-embedding-3-large
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: text-embedding-3-large
    endpoint: null
  price:
    per_input_token: 0.13
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: embeddings
  limits:
    max_context_size: 8191
  description: Most capable embedding model for both english and non-english tasks
  parameters: null
- model: text-embedding-3-small
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: text-embedding-3-small
    endpoint: null
  price:
    per_input_token: 0.02
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: embeddings
  limits:
    max_context_size: 8191
  description: Increased performance over 2nd generation ada embedding model
  parameters: null
- model: text-embedding-ada-002
  model_provider: openai
  inference_provider:
    provider: openai
    model_name: text-embedding-ada-002
    endpoint: null
  price:
    per_input_token: 0.1
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: embeddings
  limits:
    max_context_size: 8192
  description: Most capable 2nd generation embedding model, replacing 16 first generation models
  parameters: null
- model: claude-3-haiku-20240307
  model_provider: anthropic
  inference_provider:
    provider: anthropic
    model_name: claude-3-haiku-20240307
    endpoint: null
  price:
    per_input_token: 0.25
    per_output_token: 1.25
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: Fastest and most compact model for near-instant responsiveness. Quick and accurate targeted performance
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-opus-20240229
  model_provider: anthropic
  inference_provider:
    provider: anthropic
    model_name: claude-3-opus-20240229
    endpoint: null
  price:
    per_input_token: 5.0
    per_output_token: 75.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: Powerful model for highly complex tasks. Top-level intelligence, fluency, and understanding
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-sonnet-20240229
  model_provider: anthropic
  inference_provider:
    provider: anthropic
    model_name: claude-3-sonnet-20240229
    endpoint: null
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: Balance of intelligence and speed. Strong utility, balanced for scaled deployments
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-1.5-flash-8b
  model_provider: gemini
  inference_provider:
    provider: gemini
    model_name: gemini-1.5-flash-8b
    endpoint: null
  price:
    per_input_token: 0.075
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  - image
  - audio
  - video
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: lightweight model, smaller and faster, lower price + higher rate limits + Lower latency on small prompts (compared to 1.5 Flash)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-1.5-flash-latest
  model_provider: gemini
  inference_provider:
    provider: gemini
    model_name: gemini-1.5-flash-latest
    endpoint: null
  price:
    per_input_token: 0.15
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  - image
  - audio
  - video
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: Fast and versatile performance across a diverse variety of tasks.
  parameters: {}
- model: command-r-plus-v1.0
  model_provider: cohere
  inference_provider:
    provider: bedrock
    model_name: command-r-plus-v1.0
    endpoint: null
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Command R+ is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It is best suited for complex RAG workflows and multi-step tool use.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r-v1.0
  model_provider: cohere
  inference_provider:
    provider: bedrock
    model_name: command-r-v1.0
    endpoint: null
  price:
    per_input_token: 0.5
    per_output_token: 1.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama3-1-70b-instruct-v1.0
  model_provider: meta
  inference_provider:
    provider: bedrock
    model_name: llama3-1-70b-instruct-v1.0
    endpoint: null
  price:
    per_input_token: 0.72
    per_output_token: 0.72
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Ideal for content creation, conversational AI, language understanding, research development, and enterprise applications. With new latency-optimized inference capabilities available in public preview, this model sets a new performance benchmark for AI solutions that process extensive text inputs, enabling applications to respond more quickly and handle longer queries more efficiently.
  parameters: null
- model: llama3-1-8b-instruct-v1.0
  model_provider: meta
  inference_provider:
    provider: bedrock
    model_name: llama3-1-8b-instruct-v1.0
    endpoint: null
  price:
    per_input_token: 0.22
    per_output_token: 0.22
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Ideal for limited computational power and resources, faster training times, and edge devices.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama3-2-1b-instruct-v1.0
  model_provider: meta
  inference_provider:
    provider: bedrock
    model_name: llama3-2-1b-instruct-v1.0
    endpoint: null
  price:
    per_input_token: 0.1
    per_output_token: 0.1
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: Text-only lightweight model built to deliver fast and accurate responses. Ideal for edge devices and mobile applications. The model enables on-device AI capabilities while preserving user privacy and minimizing latency.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama3-70b-instruct-v1.0
  model_provider: meta
  inference_provider:
    provider: bedrock
    model_name: llama3-70b-instruct-v1.0
    endpoint: null
  price:
    per_input_token: 2.65
    per_output_token: 3.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: 'ideal for content creation, conversational AI, language understanding, research development, and enterprise applications. '
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama3-8b-instruct-v1.0
  model_provider: meta
  inference_provider:
    provider: bedrock
    model_name: llama3-8b-instruct-v1.0
    endpoint: null
  price:
    per_input_token: 0.3
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: ideal for limited computational power and resources, and edge devices. The model excels at text summarization, text classification, sentiment analysis, and language translation.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mixtral-8x7b-instruct-v0.1
  model_provider: mistral
  inference_provider:
    provider: bedrock
    model_name: mixtral-8x7b-instruct-v0.1
    endpoint: null
  price:
    per_input_token: 0.45
    per_output_token: 0.7
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: A 7B sparse Mixture-of-Experts model with stronger capabilities than Mistral AI 7B. Uses 12B active parameters out of 45B total.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-beta
  model_provider: xai
  inference_provider:
    provider: xai
    model_name: grok-beta
    endpoint: https://api.x.ai/v1
  price:
    per_input_token: 5.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: Grok-beta is an experimental AI model developed by xAI, designed to provide insightful, witty, and context-aware responses while continuously learning and improving through user interactions.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-vision-beta
  model_provider: xai
  inference_provider:
    provider: xai
    model_name: grok-vision-beta
    endpoint: https://api.x.ai/v1
  price:
    per_input_token: 5.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: Grok-vision-beta is an advanced AI model developed by xAI that integrates multimodal capabilities, allowing it to process and understand both text and visual inputs to provide more comprehensive and context-aware responses
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: dbrx-instruct
  model_provider: databricks
  inference_provider:
    provider: togetherai
    model_name: databricks/dbrx-instruct
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.8
    per_output_token: 0.8
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: DBRX Instruct is a model by Databricks, designed for instruction-following tasks and general language understanding.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1
  model_provider: deepseek
  inference_provider:
    provider: fireworksai
    model_name: accounts/fireworks/models/deepseek-r1
    endpoint: https://api.fireworks.ai/inference/v1
  price:
    per_input_token: 8.0
    per_output_token: 8.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 160000
  description: DeepSeek-Reasoner is an advanced AI model designed to enhance logical reasoning and problem-solving capabilities, leveraging deep learning techniques to provide accurate and contextually relevant insights across various domains.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-R1
  model_provider: deepseek
  inference_provider:
    provider: deepinfra
    model_name: deepseek-ai/DeepSeek-R1
    endpoint: https://api.deepinfra.com/v1/openai
  price:
    per_input_token: 0.75
    per_output_token: 2.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16000
  description: DeepSeek-Reasoner is an advanced AI model designed to enhance logical reasoning and problem-solving capabilities, leveraging deep learning techniques to provide accurate and contextually relevant insights across various domains.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-R1-Distill-Llama-70B
  model_provider: deepseek
  inference_provider:
    provider: deepinfra
    model_name: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
    endpoint: https://api.deepinfra.com/v1/openai
  price:
    per_input_token: 0.23
    per_output_token: 0.69
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-R1-Distill-Llama-70B
  model_provider: deepseek
  inference_provider:
    provider: togetherai
    model_name: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 2.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-R1-Distill-Qwen-14B
  model_provider: deepseek
  inference_provider:
    provider: togetherai
    model_name: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 1.6
    per_output_token: 1.6
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-R1-Distill-Qwen-1.5B
  model_provider: deepseek
  inference_provider:
    provider: togetherai
    model_name: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.18
    per_output_token: 0.18
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-R1-Distill-Qwen-32B
  model_provider: deepseek
  inference_provider:
    provider: deepinfra
    model_name: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
    endpoint: https://api.deepinfra.com/v1/openai
  price:
    per_input_token: 0.12
    per_output_token: 0.18
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-v3
  model_provider: deepseek
  inference_provider:
    provider: fireworksai
    model_name: accounts/fireworks/models/deepseek-v3
    endpoint: https://api.fireworks.ai/inference/v1
  price:
    per_input_token: 0.9
    per_output_token: 0.9
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-V3
  model_provider: deepseek
  inference_provider:
    provider: deepinfra
    model_name: deepseek-ai/DeepSeek-V3
    endpoint: https://api.deepinfra.com/v1/openai
  price:
    per_input_token: 0.49
    per_output_token: 0.89
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16000
  description: A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: DeepSeek-V3
  model_provider: deepseek
  inference_provider:
    provider: togetherai
    model_name: deepseek-ai/DeepSeek-V3
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 1.25
    per_output_token: 1.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token from Deepseek.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemma-2-27b-it
  model_provider: google
  inference_provider:
    provider: togetherai
    model_name: google/gemma-2-27b-it
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: Gemma 2 offers best-in-class performance, runs at incredible speed across different hardware and easily integrates with other AI tools.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemma-2-9b-it
  model_provider: google
  inference_provider:
    provider: togetherai
    model_name: google/gemma-2-9b-it
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: Gemma 2 offers best-in-class performance, runs at incredible speed across different hardware and easily integrates with other AI tools.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Llama-2-13b-chat-hf
  model_provider: meta
  inference_provider:
    provider: togetherai
    model_name: meta-llama/Llama-2-13b-chat-hf
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: LLaMA-2 Chat (13B) is Meta's conversational AI model, designed for engaging and coherent dialogue.
  parameters: null
- model: Llama-3.1-Nemotron-70B-Instruct-HF
  model_provider: nvidia
  inference_provider:
    provider: togetherai
    model_name: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.9
    per_output_token: 0.9
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses.
  parameters: null
- model: llama-v3p1-405b-instruct
  model_provider: meta
  inference_provider:
    provider: fireworksai
    model_name: accounts/fireworks/models/llama-v3p1-405b-instruct
    endpoint: https://api.fireworks.ai/inference/v1
  price:
    per_input_token: 3.0
    per_output_token: 3.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Text-only lightweight model built to deliver fast and accurate responses. Ideal for edge devices and mobile applications. The model enables on-device AI capabilities while preserving user privacy and minimizing latency.
  parameters: null
- model: MythoMax-L2-13b
  model_provider: gryphe
  inference_provider:
    provider: togetherai
    model_name: Gryphe/MythoMax-L2-13b
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: MythoMax-L2 (13B) is a model by Gryphe, known for its creative text generation capabilities.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Nous-Hermes-2-Mixtral-8x7B-DPO
  model_provider: nousresearch
  inference_provider:
    provider: togetherai
    model_name: NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.9
    per_output_token: 0.9
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) is a large model by NousResearch, utilizing advanced training techniques for improved performance.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-4
  model_provider: microsoft
  inference_provider:
    provider: deepinfra
    model_name: microsoft/phi-4
    endpoint: https://api.deepinfra.com/v1/openai
  price:
    per_input_token: 0.07
    per_output_token: 0.14
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Qwen2.5-72B-Instruct-Turbo
  model_provider: qwen
  inference_provider:
    provider: togetherai
    model_name: Qwen/Qwen2.5-72B-Instruct-Turbo
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 1.2
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: Qwen 2.5 72B Instruct is a large-scale model by Qwen, offering advanced capabilities for complex language tasks.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Qwen2.5-7B-Instruct-Turbo
  model_provider: qwen
  inference_provider:
    provider: togetherai
    model_name: Qwen/Qwen2.5-7B-Instruct-Turbo
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: Qwen 2.5 7B Instruct is a fast and efficient model by Qwen, optimized for quick responses in instruction-following tasks.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Qwen2.5-Coder-32B-Instruct
  model_provider: qwen
  inference_provider:
    provider: togetherai
    model_name: Qwen/Qwen2.5-Coder-32B-Instruct
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.8
    per_output_token: 0.8
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Qwen2-72B-Instruct
  model_provider: qwen
  inference_provider:
    provider: togetherai
    model_name: Qwen/Qwen2-72B-Instruct
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 1.2
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: Qwen 2 Instruct (72B) is a large-scale model by Qwen, offering advanced capabilities for complex language tasks.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen2p5-coder-32b-instruct
  model_provider: qwen
  inference_provider:
    provider: fireworksai
    model_name: accounts/fireworks/models/qwen2p5-coder-32b-instruct
    endpoint: https://api.fireworks.ai/inference/v1
  price:
    per_input_token: 0.9
    per_output_token: 0.9
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: Qwen/Qwen2.5-Coder-32B-Instruct
  model_provider: qwen
  inference_provider:
    provider: deepinfra
    model_name: Qwen/Qwen2.5-Coder-32B-Instruct
    endpoint: https://api.deepinfra.com/v1/openai
  price:
    per_input_token: 0.07
    per_output_token: 0.16
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: QwQ-32B-Preview
  model_provider: qwen
  inference_provider:
    provider: togetherai
    model_name: Qwen/QwQ-32B-Preview
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 1.2
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: SOLAR-10.7B-Instruct-v1.0
  model_provider: upstage
  inference_provider:
    provider: togetherai
    model_name: upstage/SOLAR-10.7B-Instruct-v1.0
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: Upstage SOLAR Instruct v1 (11B) is a versatile model by Upstage, focused on following instructions across various domains.
  parameters: null
- model: WizardLM-2-8x22B
  model_provider: microsoft
  inference_provider:
    provider: togetherai
    model_name: microsoft/WizardLM-2-8x22B
    endpoint: https://api.together.xyz/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 65536
  description: WizardLM-2 8x22B is a large language model developed by Microsoft, known for its advanced capabilities in natural language understanding and generation.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: aion-1.0
  model_provider: aion-labs
  inference_provider:
    provider: openrouter
    model_name: aion-labs/aion-1.0
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: Aion-1.0 is a multi-model system designed for high performance across various tasks, including reasoning and coding. It is built on DeepSeek-R1, augmented with additional models and techniques such as Tree of Thoughts (ToT) and Mixture of Experts (MoE). It is Aion Lab's most powerful reasoning model.
  parameters:
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: aion-1.0-mini
  model_provider: aion-labs
  inference_provider:
    provider: openrouter
    model_name: aion-labs/aion-1.0-mini
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 2.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: Aion-1.0-Mini 32B parameter model is a distilled version of the DeepSeek-R1 model, designed for strong performance in reasoning domains such as mathematics, coding, and logic. It is a modified variant of a FuseAI model that outperforms R1-Distill-Qwen-32B and R1-Distill-Llama-70B, with benchmark results available on its [Hugging Face page](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview), independently replicated for verification.
  parameters:
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: aion-rp-llama-3.1-8b
  model_provider: aion-labs
  inference_provider:
    provider: openrouter
    model_name: aion-labs/aion-rp-llama-3.1-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: Aion-RP-Llama-3.1-8B ranks the highest in the character evaluation portion of the RPBench-Auto benchmark, a roleplaying-specific variant of Arena-Hard-Auto, where LLMs evaluate each other’s responses. It is a fine-tuned base model rather than an instruct model, designed to produce more natural and varied writing.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: airoboros-l2-70b
  model_provider: jondurbin
  inference_provider:
    provider: openrouter
    model_name: jondurbin/airoboros-l2-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4000
  description: |-
    A Llama 2 70B fine-tune using synthetic data (the Airoboros dataset).

    Currently based on [jondurbin/airoboros-l2-70b](https://huggingface.co/jondurbin/airoboros-l2-70b-2.2.1), but might get updated in the future.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: chatgpt-4o-latest
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/chatgpt-4o-latest
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 5.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in that it has additional RLHF. It is intended for research and evaluation.

    OpenAI notes that this model is not suited for production use-cases as it may be removed or redirected to another model in the future.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-2
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-2
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 200000
  description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-2.0
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-2.0
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 100000
  description: Anthropic's flagship model. Superior performance on tasks that require complex reasoning. Supports hundreds of pages of text.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-2.0:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-2.0:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 100000
  description: Anthropic's flagship model. Superior performance on tasks that require complex reasoning. Supports hundreds of pages of text.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-2.1
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-2.1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 200000
  description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-2.1:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-2.1:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 200000
  description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-2:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-2:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 8.0
    per_output_token: 24.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 200000
  description: 'Claude 2 delivers advancements in key capabilities for enterprises—including an industry-leading 200K token context window, significant reductions in rates of model hallucination, system prompts and a new beta feature: tool use.'
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-haiku
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-haiku
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 4.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.

    This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.

    This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-haiku-20241022
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-haiku-20241022
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 4.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries.

    It does not support image inputs.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use)
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-haiku-20241022:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-haiku-20241022:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 4.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3.5 Haiku features enhancements across all skill sets including coding, tool use, and reasoning. As the fastest model in the Anthropic lineup, it offers rapid response times suitable for applications that require high interactivity and low latency, such as user-facing chatbots and on-the-fly code completions. It also excels in specialized tasks like data extraction and real-time content moderation, making it a versatile tool for a broad range of industries.

    It does not support image inputs.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/3-5-models-and-computer-use)
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-haiku:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-haiku:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 4.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. Engineered to excel in real-time applications, it delivers quick response times that are essential for dynamic tasks such as chat interactions and immediate coding suggestions.

    This makes it highly suitable for environments that demand both speed and precision, such as software development, customer service bots, and data management systems.

    This model is currently pointing to [Claude 3.5 Haiku (2024-10-22)](/anthropic/claude-3-5-haiku-20241022).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-sonnet
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-sonnet
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:

    - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
    - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
    - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
    - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-sonnet-20240620
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-sonnet-20240620
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:

    - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting
    - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
    - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
    - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

    For the latest version (2024-10-23), check out [Claude 3.5 Sonnet](/anthropic/claude-3.5-sonnet).

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-sonnet-20240620:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-sonnet-20240620:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:

    - Coding: Autonomously writes, edits, and runs code with reasoning and troubleshooting
    - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
    - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
    - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

    For the latest version (2024-10-23), check out [Claude 3.5 Sonnet](/anthropic/claude-3.5-sonnet).

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3.5-sonnet:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3.5-sonnet:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:

    - Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
    - Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
    - Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
    - Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-haiku
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3-haiku
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.25
    per_output_token: 1.25
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3 Haiku is Anthropic's fastest and most compact model for
    near-instant responsiveness. Quick and accurate targeted performance.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-haiku:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3-haiku:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.25
    per_output_token: 1.25
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3 Haiku is Anthropic's fastest and most compact model for
    near-instant responsiveness. Quick and accurate targeted performance.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-haiku)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-opus
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3-opus
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 15.0
    per_output_token: 75.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-opus:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3-opus:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 15.0
    per_output_token: 75.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3 Opus is Anthropic's most powerful model for highly complex tasks. It boasts top-level performance, intelligence, fluency, and understanding.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-sonnet
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3-sonnet
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: claude-3-sonnet:beta
  model_provider: anthropic
  inference_provider:
    provider: openrouter
    model_name: anthropic/claude-3-sonnet:beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    Claude 3 Sonnet is an ideal balance of intelligence and speed for enterprise workloads. Maximum utility at a lower price, dependable, balanced for scaled deployments.

    See the launch announcement and benchmark results [here](https://www.anthropic.com/news/claude-3-family)

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: codestral-2501
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/codestral-2501
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.3
    per_output_token: 0.8999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 256000
  description: "[Mistral](/mistralai)'s cutting-edge language model for coding. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation. \n\nLearn more on their blog post: https://mistral.ai/news/codestral-2501/"
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: codestral-mamba
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/codestral-mamba
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.25
    per_output_token: 0.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 256000
  description: |-
    A 7.3B parameter Mamba-based model designed for code and reasoning tasks.

    - Linear time inference, allowing for theoretically infinite sequence lengths
    - 256k token context window
    - Optimized for quick responses, especially beneficial for code productivity
    - Performs comparably to state-of-the-art transformer models in code and reasoning tasks
    - Available under the Apache 2.0 license for free use, modification, and distribution
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.95
    per_output_token: 1.9
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    Command is an instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models.

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.475
    per_output_token: 1.4249999999999998
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents.

    Read the launch post [here](https://txt.cohere.com/command-r/).

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r-03-2024
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r-03-2024
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.475
    per_output_token: 1.4249999999999998
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Command-R is a 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents.

    Read the launch post [here](https://txt.cohere.com/command-r/).

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r-08-2024
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r-08-2024
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.14250000000000002
    per_output_token: 0.5700000000000001
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    command-r-08-2024 is an update of the [Command R](/models/cohere/command-r) with improved performance for multilingual retrieval-augmented generation (RAG) and tool use. More broadly, it is better at math, code and reasoning and is competitive with the previous version of the larger Command R+ model.

    Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r7b-12-2024
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r7b-12-2024
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0375
    per_output_token: 0.15
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: Command R7B (12-2024) is a small, fast update of the Command R+ model, delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r-plus
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r-plus
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.8499999999999996
    per_output_token: 14.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG).

    It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r-plus-04-2024
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r-plus-04-2024
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.8499999999999996
    per_output_token: 14.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Command R+ is a new, 104B-parameter LLM from Cohere. It's useful for roleplay, general consumer usecases, and Retrieval Augmented Generation (RAG).

    It offers multilingual support for ten key languages to facilitate global business operations. See benchmarks and the launch post [here](https://txt.cohere.com/command-r-plus-microsoft-azure/).

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: command-r-plus-08-2024
  model_provider: cohere
  inference_provider:
    provider: openrouter
    model_name: cohere/command-r-plus-08-2024
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.375
    per_output_token: 9.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    command-r-plus-08-2024 is an update of the [Command R+](/models/cohere/command-r-plus) with roughly 50% higher throughput and 25% lower latencies as compared to the previous Command R+ version, while keeping the hardware footprint the same.

    Read the launch post [here](https://docs.cohere.com/changelog/command-gets-refreshed).

    Use of this model is subject to Cohere's [Acceptable Use Policy](https://docs.cohere.com/docs/c4ai-acceptable-use-policy).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: dbrx-instruct
  model_provider: databricks
  inference_provider:
    provider: openrouter
    model_name: databricks/dbrx-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.2
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    DBRX is a new open source large language model developed by Databricks. At 132B, it outperforms existing open source LLMs like Llama 2 70B and [Mixtral-8x7b](/models/mistralai/mixtral-8x7b) on standard industry benchmarks for language understanding, programming, math, and logic.

    It uses a fine-grained mixture-of-experts (MoE) architecture. 36B parameters are active on any input. It was pre-trained on 12T tokens of text and code data. Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts.

    See the launch announcement and benchmark results [here](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm).

    #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-chat
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-chat
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.49
    per_output_token: 0.8899999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16000
  description: |-
    DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models.

    For model details, please visit [the DeepSeek-V3 repo](https://github.com/deepseek-ai/DeepSeek-V3) for more information, or see the [launch announcement](https://api-docs.deepseek.com/news/news1226).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-chat-v2.5
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-chat-v2.5
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. For model details, please visit [DeepSeek-V2 page](https://github.com/deepseek-ai/DeepSeek-V2) for more information.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 8.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 163840
  description: |-
    DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

    Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).

    MIT licensed: Distill & commercialize freely!
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1-distill-llama-70b
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1-distill-llama-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.23
    per_output_token: 0.69
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

    - AIME 2024 pass@1: 70.0
    - MATH-500 pass@1: 94.5
    - CodeForces Rating: 1633

    The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1-distill-llama-70b:free
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1-distill-llama-70b:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

    - AIME 2024 pass@1: 70.0
    - MATH-500 pass@1: 94.5
    - CodeForces Rating: 1633

    The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1-distill-qwen-14b
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1-distill-qwen-14b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.6
    per_output_token: 1.6
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    DeepSeek R1 Distill Qwen 14B is a distilled large language model based on [Qwen 2.5 14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

    Other benchmark results include:

    - AIME 2024 pass@1: 69.7
    - MATH-500 pass@1: 93.9
    - CodeForces Rating: 1481

    The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1-distill-qwen-1.5b
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1-distill-qwen-1.5b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.18
    per_output_token: 0.18
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    DeepSeek R1 Distill Qwen 1.5B is a distilled large language model based on  [Qwen 2.5 Math 1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It's a very small and efficient model which outperforms [GPT 4o 0513](/openai/gpt-4o-2024-05-13) on Math Benchmarks.

    Other benchmark results include:

    - AIME 2024 pass@1: 28.9
    - AIME 2024 cons@64: 52.7
    - MATH-500 pass@1: 83.9

    The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1-distill-qwen-32b
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1-distill-qwen-32b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.12
    per_output_token: 0.18
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

    Other benchmark results include:

    - AIME 2024 pass@1: 72.6
    - MATH-500 pass@1: 94.3
    - CodeForces Rating: 1691

    The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: deepseek-r1:free
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 163840
  description: |-
    DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

    Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).

    MIT licensed: Distill & commercialize freely!
  parameters:
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
- model: deepseek-r1:nitro
  model_provider: deepseek
  inference_provider:
    provider: openrouter
    model_name: deepseek/deepseek-r1:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 8.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 163840
  description: |-
    DeepSeek R1 is here: Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

    Fully open-source model & [technical report](https://api-docs.deepseek.com/news/news250120).

    MIT licensed: Distill & commercialize freely!
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: dolphin-mixtral-8x7b
  model_provider: cognitivecomputations
  inference_provider:
    provider: openrouter
    model_name: cognitivecomputations/dolphin-mixtral-8x7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    This is a 16k context fine-tune of [Mixtral-8x7b](/models/mistralai/mixtral-8x7b). It excels in coding tasks due to extensive training with coding data and is known for its obedience, although it lacks DPO tuning.

    The model is uncensored and is stripped of alignment and bias. It requires an external alignment layer for ethical use. Users are cautioned to use this highly compliant model responsibly, as detailed in a blog post about uncensored models at [erichartford.com/uncensored-models](https://erichartford.com/uncensored-models).

    #moe #uncensored
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: eva-llama-3.33-70b
  model_provider: eva-unit-01
  inference_provider:
    provider: openrouter
    model_name: eva-unit-01/eva-llama-3.33-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 4.0
    per_output_token: 6.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |
    EVA Llama 3.33 70b is a roleplay and storywriting specialist model. It is a full-parameter finetune of [Llama-3.3-70B-Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct) on mixture of synthetic and natural data.

    It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model

    This model was built with Llama by Meta.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: eva-qwen-2.5-32b
  model_provider: eva-unit-01
  inference_provider:
    provider: openrouter
    model_name: eva-unit-01/eva-qwen-2.5-32b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.6
    per_output_token: 3.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    EVA Qwen2.5 32B is a roleplaying/storywriting specialist model. It's a full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.

    It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: eva-qwen-2.5-72b
  model_provider: eva-unit-01
  inference_provider:
    provider: openrouter
    model_name: eva-unit-01/eva-qwen-2.5-72b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 4.0
    per_output_token: 6.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    EVA Qwen2.5 72B is a roleplay and storywriting specialist model. It's a full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data.

    It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: fimbulvetr-11b-v2
  model_provider: sao10k
  inference_provider:
    provider: openrouter
    model_name: sao10k/fimbulvetr-11b-v2
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    Creative writing model, routed with permission. It's fast, it keeps the conversation going, and it stays in character.

    If you submit a raw prompt, you can use Alpaca or Vicuna formats.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-flash-001
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-2.0-flash-001
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1
    per_output_token: 0.4
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-flash-exp:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-2.0-flash-exp:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1048576
  description: Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5). It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-flash-lite-preview-02-05:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-2.0-flash-lite-preview-02-05:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: Gemini Flash Lite 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](google/gemini-pro-1.5). Because it's currently in preview, it will be **heavily rate-limited** by Google. This model will move from free to paid pending a general rollout on February 24th, at $0.075 / $0.30 per million input / ouput tokens respectively.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-flash-thinking-exp-1219:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-2.0-flash-thinking-exp-1219:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 40000
  description: Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the [base Gemini 2.0 Flash model](/google/gemini-2.0-flash-exp).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-flash-thinking-exp:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-2.0-flash-thinking-exp:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 1048576
  description: |-
    Gemini 2.0 Flash Thinking Experimental (01-21) is a snapshot of Gemini 2.0 Flash Thinking Experimental.

    Gemini 2.0 Flash Thinking Mode is an experimental model that's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the [base Gemini 2.0 Flash model](/google/gemini-2.0-flash-exp).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-2.0-pro-exp-02-05:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-2.0-pro-exp-02-05:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 2000000
  description: |-
    Gemini 2.0 Pro Experimental is a bleeding-edge version of the Gemini 2.0 Pro model. Because it's currently experimental, it will be **heavily rate-limited** by Google.

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-exp-1206:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-exp-1206:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 2097152
  description: Experimental release (December 6, 2024) of Gemini.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-flash-1.5
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-flash-1.5
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.075
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: |-
    Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.

    Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-flash-1.5-8b
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-flash-1.5-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0375
    per_output_token: 0.15
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: |-
    Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.

    [Click here to learn more about this model](https://developers.googleblog.com/en/gemini-15-flash-8b-is-now-generally-available-for-use/).

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-flash-1.5-8b-exp
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-flash-1.5-8b-exp
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: |-
    Gemini Flash 1.5 8B Experimental is an experimental, 8B parameter version of the [Gemini Flash 1.5](/models/google/gemini-flash-1.5) model.

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

    #multimodal

    Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-pro
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-pro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 1.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32760
  description: |-
    Google's flagship text generation model. Designed to handle natural language tasks, multiturn text and code chat, and code generation.

    See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/).

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-pro-1.5
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-pro-1.5
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.25
    per_output_token: 5.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 2000000
  description: |-
    Google's latest multimodal model, supports image and video[0] in text or chat prompts.

    Optimized for language tasks including:

    - Code generation
    - Text generation
    - Text editing
    - Problem solving
    - Recommendations
    - Information extraction
    - Data extraction or generation
    - AI agents

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

    * [0]: Video input is not available through OpenRouter at this time.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemini-pro-vision
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemini-pro-vision
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 1.5
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    Google's flagship multimodal model, supporting image and video in text or chat prompts for a text or code response.

    See the benchmarks and prompting guidelines from [Deepmind](https://deepmind.google/technologies/gemini/).

    Usage of Gemini is subject to Google's [Gemini Terms of Use](https://ai.google.dev/terms).

    #multimodal
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemma-2-27b-it
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemma-2-27b-it
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.27
    per_output_token: 0.27
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini).

    Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning.

    See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemma-2-9b-it
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemma-2-9b-it
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.03
    per_output_token: 0.06
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.

    Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.

    See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemma-2-9b-it:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemma-2-9b-it:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Gemma 2 9B by Google is an advanced, open-source language model that sets a new standard for efficiency and performance in its size class.

    Designed for a wide variety of tasks, it empowers developers and researchers to build innovative applications, while maintaining accessibility, safety, and cost-effectiveness.

    See the [launch announcement](https://blog.google/technology/developers/google-gemma-2/) for more details. Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gemma-7b-it
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/gemma-7b-it
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.15
    per_output_token: 0.15
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Gemma by Google is an advanced, open-source language model family, leveraging the latest in decoder-only, text-to-text technology. It offers English language capabilities across text generation tasks like question answering, summarization, and reasoning. The Gemma 7B variant is comparable in performance to leading open source models.

    Usage of Gemma is subject to Google's [Gemma Terms of Use](https://ai.google.dev/gemma/terms).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: goliath-120b
  model_provider: alpindale
  inference_provider:
    provider: openrouter
    model_name: alpindale/goliath-120b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 9.375
    per_output_token: 9.375
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 6144
  description: |-
    A large LLM created by combining two fine-tuned Llama 70B models into one 120B model. Combines Xwin and Euryale.

    Credits to
    - [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
    - [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios.

    #merge
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-3.5-turbo
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-3.5-turbo
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 1.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16385
  description: |-
    GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.

    Training data up to Sep 2021.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-3.5-turbo-0125
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-3.5-turbo-0125
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 1.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16385
  description: |-
    The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021.

    This version has a higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-3.5-turbo-0613
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-3.5-turbo-0613
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 4095
  description: |-
    GPT-3.5 Turbo is OpenAI's fastest model. It can understand and generate natural language or code, and is optimized for chat and traditional completion tasks.

    Training data up to Sep 2021.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-3.5-turbo-1106
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-3.5-turbo-1106
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16385
  description: 'An older GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-3.5-turbo-16k
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-3.5-turbo-16k
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 4.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 16385
  description: 'This model offers four times the context length of gpt-3.5-turbo, allowing it to support approximately 20 pages of text in a single request at a higher cost. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-3.5-turbo-instruct
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-3.5-turbo-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.5
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4095
  description: 'This model is a variant of GPT-3.5 Turbo tuned for instructional prompts and omitting chat-related optimizations. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 30.0
    per_output_token: 60.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8191
  description: 'OpenAI''s flagship model, GPT-4 is a large-scale multimodal language model capable of solving difficult problems with greater accuracy than previous models due to its broader general knowledge and advanced reasoning capabilities. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4-0314
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4-0314
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 30.0
    per_output_token: 60.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8191
  description: 'GPT-4-0314 is the first version of GPT-4 released, with a context length of 8,192 tokens, and was supported until June 14. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4-1106-preview
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4-1106-preview
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 10.0
    per_output_token: 30.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.

    Training data: up to April 2023.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4-32k
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4-32k
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 60.0
    per_output_token: 120.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32767
  description: 'GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial for handling longer content like interacting with PDFs without an external vector database. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4-32k-0314
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4-32k-0314
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 60.0
    per_output_token: 120.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32767
  description: 'GPT-4-32k is an extended version of GPT-4, with the same capabilities but quadrupled context length, allowing for processing up to 40 pages of text in a single pass. This is particularly beneficial for handling longer content like interacting with PDFs without an external vector database. Training data: up to Sep 2021.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

    For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o-2024-05-13
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o-2024-05-13
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 5.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

    For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o-2024-08-06
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o-2024-08-06
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The 2024-08-06 version of GPT-4o offers improved performance in structured outputs, with the ability to supply a JSON schema in the respone_format. Read more [here](https://openai.com/index/introducing-structured-outputs-in-the-api/).

    GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

    For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o-2024-11-20
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o-2024-11-20
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The 2024-11-20 version of GPT-4o offers a leveled-up creative writing ability with more natural, engaging, and tailored writing to improve relevance & readability. It’s also better at working with uploaded files, providing deeper insights & more thorough responses.

    GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o:extended
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o:extended
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 6.0
    per_output_token: 18.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities.

    For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209)

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o-mini
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o-mini
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.15
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.

    As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.

    GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).

    Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4o-mini-2024-07-18
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4o-mini-2024-07-18
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.15
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs.

    As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than [GPT-3.5 Turbo](/models/openai/gpt-3.5-turbo). It maintains SOTA intelligence, while being significantly more cost-effective.

    GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences [common leaderboards](https://arena.lmsys.org/).

    Check out the [launch announcement](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) to learn more.

    #multimodal
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4-turbo
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4-turbo
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 10.0
    per_output_token: 30.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.

    Training data: up to December 2023.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: gpt-4-turbo-preview
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/gpt-4-turbo-preview
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 10.0
    per_output_token: 30.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The preview GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Dec 2023.

    **Note:** heavily rate limited by OpenAI while in preview.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-2-1212
  model_provider: x-ai
  inference_provider:
    provider: openrouter
    model_name: x-ai/grok-2-1212
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: Grok 2 1212 introduces significant enhancements to accuracy, instruction adherence, and multilingual support, making it a powerful and flexible choice for developers seeking a highly steerable, intelligent model.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-2-vision-1212
  model_provider: x-ai
  inference_provider:
    provider: openrouter
    model_name: x-ai/grok-2-vision-1212
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Grok 2 Vision 1212 advances image-based AI with stronger visual comprehension, refined instruction-following, and multilingual support. From object recognition to style analysis, it empowers developers to build more intuitive, visually aware applications. Its enhanced steerability and reasoning establish a robust foundation for next-generation image solutions.

    To read more about this model, check out [xAI's announcement](https://x.ai/blog/grok-1212).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-beta
  model_provider: x-ai
  inference_provider:
    provider: openrouter
    model_name: x-ai/grok-beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 5.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Grok Beta is xAI's experimental language model with state-of-the-art reasoning capabilities, best for complex and multi-step use cases.

    It is the successor of [Grok 2](https://x.ai/blog/grok-2) with enhanced context length.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: grok-vision-beta
  model_provider: x-ai
  inference_provider:
    provider: openrouter
    model_name: x-ai/grok-vision-beta
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 5.0
    per_output_token: 15.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: |+
    Grok Vision Beta is xAI's experimental language model with vision capability.

  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: hermes-2-pro-llama-3-8b
  model_provider: nousresearch
  inference_provider:
    provider: openrouter
    model_name: nousresearch/hermes-2-pro-llama-3-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.025
    per_output_token: 0.04
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131000
  description: Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: hermes-3-llama-3.1-405b
  model_provider: nousresearch
  inference_provider:
    provider: openrouter
    model_name: nousresearch/hermes-3-llama-3.1-405b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 0.7999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131000
  description: |-
    Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

    Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

    The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

    Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: hermes-3-llama-3.1-70b
  model_provider: nousresearch
  inference_provider:
    provider: openrouter
    model_name: nousresearch/hermes-3-llama-3.1-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.12
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131000
  description: |-
    Hermes 3 is a generalist language model with many improvements over [Hermes 2](/models/nousresearch/nous-hermes-2-mistral-7b-dpo), including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

    Hermes 3 70B is a competitive, if not superior finetune of the [Llama-3.1 70B foundation model](/models/meta-llama/llama-3.1-70b-instruct), focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

    The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: inflection-3-pi
  model_provider: inflection
  inference_provider:
    provider: openrouter
    model_name: inflection/inflection-3-pi
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8000
  description: |-
    Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay.

    Pi has been trained to mirror your tone and style, if you use more emojis, so will Pi! Try experimenting with various prompts and conversation styles.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: inflection-3-productivity
  model_provider: inflection
  inference_provider:
    provider: openrouter
    model_name: inflection/inflection-3-productivity
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.5
    per_output_token: 10.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8000
  description: |-
    Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news.

    For emotional intelligence similar to Pi, see [Inflect 3 Pi](/inflection/inflection-3-pi)

    See [Inflection's announcement](https://inflection.ai/blog/enterprise) for more details.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: jamba-1-5-large
  model_provider: ai21
  inference_provider:
    provider: openrouter
    model_name: ai21/jamba-1-5-large
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 8.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 256000
  description: |-
    Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.

    It features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.

    Built on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.

    Read their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: jamba-1-5-mini
  model_provider: ai21
  inference_provider:
    provider: openrouter
    model_name: ai21/jamba-1-5-mini
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 256000
  description: |-
    Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.

    It works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.

    This model uses less computer memory and works faster with longer texts than previous designs.

    Read their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: jamba-instruct
  model_provider: ai21
  inference_provider:
    provider: openrouter
    model_name: ai21/jamba-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 0.7
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 256000
  description: |-
    The Jamba-Instruct model, introduced by AI21 Labs, is an instruction-tuned variant of their hybrid SSM-Transformer Jamba model, specifically optimized for enterprise applications.

    - 256K Context Window: It can process extensive information, equivalent to a 400-page novel, which is beneficial for tasks involving large documents such as financial reports or legal documents
    - Safety and Accuracy: Jamba-Instruct is designed with enhanced safety features to ensure secure deployment in enterprise environments, reducing the risk and cost of implementation

    Read their [announcement](https://www.ai21.com/blog/announcing-jamba) to learn more.

    Jamba has a knowledge cutoff of February 2024.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: l3.1-70b-hanami-x1
  model_provider: sao10k
  inference_provider:
    provider: openrouter
    model_name: sao10k/l3.1-70b-hanami-x1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 3.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16000
  description: This is [Sao10K](/sao10k)'s experiment over [Euryale v2.2](/sao10k/l3.1-euryale-70b).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: l3.1-euryale-70b
  model_provider: sao10k
  inference_provider:
    provider: openrouter
    model_name: sao10k/l3.1-euryale-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7
    per_output_token: 0.7999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: Euryale L3.1 70B v2.2 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.1](/models/sao10k/l3-euryale-70b).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: l3.3-euryale-70b
  model_provider: sao10k
  inference_provider:
    provider: openrouter
    model_name: sao10k/l3.3-euryale-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7
    per_output_token: 0.7999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: Euryale L3.3 70B is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k). It is the successor of [Euryale L3 70B v2.2](/models/sao10k/l3-euryale-70b).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: l3-euryale-70b
  model_provider: sao10k
  inference_provider:
    provider: openrouter
    model_name: sao10k/l3-euryale-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7
    per_output_token: 0.7999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Euryale 70B v2.1 is a model focused on creative roleplay from [Sao10k](https://ko-fi.com/sao10k).

    - Better prompt adherence.
    - Better anatomy / spatial awareness.
    - Adapts much better to unique and custom formatting / reply formats.
    - Very creative, lots of unique swipes.
    - Is not restrictive during roleplays.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: l3-lunaris-8b
  model_provider: sao10k
  inference_provider:
    provider: openrouter
    model_name: sao10k/l3-lunaris-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.03
    per_output_token: 0.06
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Lunaris 8B is a versatile generalist and roleplaying model based on Llama 3. It's a strategic merge of multiple models, designed to balance creativity with improved logic and general knowledge.

    Created by [Sao10k](https://huggingface.co/Sao10k), this model aims to offer an improved experience over Stheno v3.2, with enhanced creativity and logical reasoning.

    For best results, use with Llama 3 Instruct context template, temperature 1.4, and min_p 0.1.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: learnlm-1.5-pro-experimental:free
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/learnlm-1.5-pro-experimental:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 40960
  description: An experimental version of [Gemini 1.5 Pro](/google/gemini-pro-1.5) from Google.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: lfm-3b
  model_provider: liquid
  inference_provider:
    provider: openrouter
    model_name: liquid/lfm-3b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.02
    per_output_token: 0.02
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Liquid's LFM 3B delivers incredible performance for its size. It positions itself as first place among 3B parameter transformers, hybrids, and RNN models It is also on par with Phi-3.5-mini on multiple benchmarks, while being 18.4% smaller.

    LFM-3B is the ideal choice for mobile and other edge text-based applications.

    See the [launch announcement](https://www.liquid.ai/liquid-foundation-models) for benchmarks and more info.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: lfm-40b
  model_provider: liquid
  inference_provider:
    provider: openrouter
    model_name: liquid/lfm-40b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.15
    per_output_token: 0.15
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Liquid's 40.3B Mixture of Experts (MoE) model. Liquid Foundation Models (LFMs) are large neural networks built with computational units rooted in dynamic systems.

    LFMs are general-purpose AI models that can be used to model any kind of sequential data, including video, audio, text, time series, and signals.

    See the [launch announcement](https://www.liquid.ai/liquid-foundation-models) for benchmarks and more info.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: lfm-7b
  model_provider: liquid
  inference_provider:
    provider: openrouter
    model_name: liquid/lfm-7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.01
    per_output_token: 0.01
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: "LFM-7B, a new best-in-class language model. LFM-7B is designed for exceptional chat capabilities, including languages like Arabic and Japanese. Powered by the Liquid Foundation Model (LFM) architecture, it exhibits unique features like low memory footprint and fast inference speed. \n\nLFM-7B is the world’s best-in-class multilingual language model in English, Arabic, and Japanese.\n\nSee the [launch announcement](https://www.liquid.ai/lfm-7b) for benchmarks and more info."
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-2-13b-chat
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-2-13b-chat
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.22
    per_output_token: 0.22
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: A 13 billion parameter language model from Meta, fine tuned for chat completions
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-2-70b-chat
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-2-70b-chat
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.8999999999999999
    per_output_token: 0.8999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: The flagship, 70 billion parameter language model from Meta, fine tuned for chat completions. Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-405b
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-405b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This is the base 405B pre-trained version.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-405b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-405b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 0.7999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-405b-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-405b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8000
  description: |-
    The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-405b-instruct:nitro
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-405b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 14.62
    per_output_token: 14.62
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8000
  description: |-
    The highly anticipated 400B class of Llama3 is here! Clocking in at 128k context with impressive eval scores, the Meta AI team continues to push the frontier of open-source LLMs.

    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 405B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models including GPT-4o and Claude 3.5 Sonnet in evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-70b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-70b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.12
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-70b-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-70b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-70b-instruct:nitro
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-70b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.25
    per_output_token: 3.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 64000
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-8b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-8b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.02
    per_output_token: 0.05
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-8b-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-8b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-8b-instruct:nitro
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.1-8b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.18
    per_output_token: 0.18
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3-1/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters: null
- model: llama-3.1-lumimaid-70b
  model_provider: neversleep
  inference_provider:
    provider: openrouter
    model_name: neversleep/llama-3.1-lumimaid-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.375
    per_output_token: 4.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    Lumimaid v0.2 70B is a finetune of [Llama 3.1 70B](/meta-llama/llama-3.1-70b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-lumimaid-8b
  model_provider: neversleep
  inference_provider:
    provider: openrouter
    model_name: neversleep/llama-3.1-lumimaid-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1875
    per_output_token: 1.125
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Lumimaid v0.2 8B is a finetune of [Llama 3.1 8B](/models/meta-llama/llama-3.1-8b-instruct) with a "HUGE step up dataset wise" compared to Lumimaid v0.1. Sloppy chats output were purged.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-nemotron-70b-instruct
  model_provider: nvidia
  inference_provider:
    provider: openrouter
    model_name: nvidia/llama-3.1-nemotron-70b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.12
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131000
  description: |-
    NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-nemotron-70b-instruct:free
  model_provider: nvidia
  inference_provider:
    provider: openrouter
    model_name: nvidia/llama-3.1-nemotron-70b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels in automatic alignment benchmarks. This model is tailored for applications requiring high accuracy in helpfulness and response generation, suitable for diverse user queries across multiple domains.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-sonar-huge-128k-online
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/llama-3.1-sonar-huge-128k-online
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 5.0
    per_output_token: 5.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 127072
  description: Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance. The model is built upon the Llama 3.1 405B and has internet access.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-sonar-large-128k-chat
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/llama-3.1-sonar-large-128k-chat
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 1.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.

    This is a normal offline LLM, but the [online version](/models/perplexity/llama-3.1-sonar-large-128k-online) of this model has Internet access.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-sonar-large-128k-online
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/llama-3.1-sonar-large-128k-online
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 1.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 127072
  description: |-
    Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.

    This is the online version of the [offline chat model](/models/perplexity/llama-3.1-sonar-large-128k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-sonar-small-128k-chat
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/llama-3.1-sonar-small-128k-chat
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.

    This is a normal offline LLM, but the [online version](/models/perplexity/llama-3.1-sonar-small-128k-online) of this model has Internet access.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.1-sonar-small-128k-online
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/llama-3.1-sonar-small-128k-online
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 127072
  description: |-
    Llama 3.1 Sonar is Perplexity's latest model family. It surpasses their earlier Sonar models in cost-efficiency, speed, and performance.

    This is the online version of the [offline chat model](/models/perplexity/llama-3.1-sonar-small-128k-chat). It is focused on delivering helpful, up-to-date, and factual responses. #online
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-11b-vision-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-11b-vision-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.055
    per_output_token: 0.055
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.

    Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-11b-vision-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-11b-vision-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.

    Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-1b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-1b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.01
    per_output_token: 0.01
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.

    Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-1b-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-1b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    Llama 3.2 1B is a 1-billion-parameter language model focused on efficiently performing natural language tasks, such as summarization, dialogue, and multilingual text analysis. Its smaller size allows it to operate efficiently in low-resource environments while maintaining strong task performance.

    Supporting eight core languages and fine-tunable for more, Llama 1.3B is ideal for businesses or developers seeking lightweight yet powerful AI solutions that can operate in diverse multilingual settings without the high computational demand of larger models.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-3b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-3b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.015
    per_output_token: 0.025
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131000
  description: |-
    Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.

    Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-3b-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-3b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    Llama 3.2 3B is a 3-billion-parameter multilingual large language model, optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization. Designed with the latest transformer architecture, it supports eight languages, including English, Spanish, and Hindi, and is adaptable for additional languages.

    Trained on 9 trillion tokens, the Llama 3.2 3B model excels in instruction-following, complex reasoning, and tool use. Its balanced performance makes it ideal for applications needing accuracy and efficiency in text generation across multilingual settings.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-90b-vision-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-90b-vision-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.8999999999999999
    per_output_token: 0.8999999999999999
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.

    This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.2-90b-vision-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.2-90b-vision-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.

    This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.

    Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md).

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3.3-70b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3.3-70b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.12
    per_output_token: 0.3
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

    Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

    [Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3-70b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3-70b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.23
    per_output_token: 0.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3-70b-instruct:nitro
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3-70b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.88
    per_output_token: 0.88
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 70B instruct-tuned version was optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters: null
- model: llama-3-8b-instruct
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3-8b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.03
    per_output_token: 0.06
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3-8b-instruct:extended
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3-8b-instruct:extended
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1875
    per_output_token: 1.125
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters: null
- model: llama-3-8b-instruct:free
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3-8b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3-8b-instruct:nitro
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-3-8b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Meta's latest class of model (Llama 3) launched with a variety of sizes & flavors. This 8B instruct-tuned version was optimized for high quality dialogue usecases.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters: null
- model: llama-3-lumimaid-70b
  model_provider: neversleep
  inference_provider:
    provider: openrouter
    model_name: neversleep/llama-3-lumimaid-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.375
    per_output_token: 4.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    The NeverSleep team is back, with a Llama 3 70B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.

    To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3-lumimaid-8b
  model_provider: neversleep
  inference_provider:
    provider: openrouter
    model_name: neversleep/llama-3-lumimaid-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1875
    per_output_token: 1.125
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 24576
  description: |-
    The NeverSleep team is back, with a Llama 3 8B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.

    To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-3-lumimaid-8b:extended
  model_provider: neversleep
  inference_provider:
    provider: openrouter
    model_name: neversleep/llama-3-lumimaid-8b:extended
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1875
    per_output_token: 1.125
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 24576
  description: |-
    The NeverSleep team is back, with a Llama 3 8B finetune trained on their curated roleplay data. Striking a balance between eRP and RP, Lumimaid was designed to be serious, yet uncensored when necessary.

    To enhance it's overall intelligence and chat capability, roughly 40% of the training data was not roleplay. This provides a breadth of knowledge to access, while still keeping roleplay as the primary strength.

    Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: llama-guard-2-8b
  model_provider: meta-llama
  inference_provider:
    provider: openrouter
    model_name: meta-llama/llama-guard-2-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    This safeguard model has 8B parameters and is based on the Llama 3 family. Just like is predecessor, [LlamaGuard 1](https://huggingface.co/meta-llama/LlamaGuard-7b), it can do both prompt and response classification.

    LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated.

    For best results, please use raw prompt input or the `/completions` endpoint, instead of the chat API.

    It has demonstrated strong performance compared to leading closed-source models in human evaluations.

    To read more about the model release, [click here](https://ai.meta.com/blog/meta-llama-3/). Usage of this model is subject to [Meta's Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: magnum-72b
  model_provider: alpindale
  inference_provider:
    provider: openrouter
    model_name: alpindale/magnum-72b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.875
    per_output_token: 2.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the first in a new family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet.

    The model is based on [Qwen2 72B](https://openrouter.ai/models/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: magnum-v2-72b
  model_provider: anthracite-org
  inference_provider:
    provider: openrouter
    model_name: anthracite-org/magnum-v2-72b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 3.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    From the maker of [Goliath](https://openrouter.ai/models/alpindale/goliath-120b), Magnum 72B is the seventh in a family of models designed to achieve the prose quality of the Claude 3 models, notably Opus & Sonnet.

    The model is based on [Qwen2 72B](https://openrouter.ai/models/qwen/qwen-2-72b-instruct) and trained with 55 million tokens of highly curated roleplay (RP) data.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: magnum-v4-72b
  model_provider: anthracite-org
  inference_provider:
    provider: openrouter
    model_name: anthracite-org/magnum-v4-72b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.875
    per_output_token: 2.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    This is a series of models designed to replicate the prose quality of the Claude 3 models, specifically Sonnet(https://openrouter.ai/anthropic/claude-3.5-sonnet) and Opus(https://openrouter.ai/anthropic/claude-3-opus).

    The model is fine-tuned on top of [Qwen2.5 72B](https://openrouter.ai/qwen/qwen-2.5-72b-instruct).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: midnight-rose-70b
  model_provider: sophosympatheia
  inference_provider:
    provider: openrouter
    model_name: sophosympatheia/midnight-rose-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 0.7999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produce lengthy output by default and is the best creative writing merge produced so far by sophosympatheia.

    Descending from earlier versions of Midnight Rose and [Wizard Tulu Dolphin 70B](https://huggingface.co/sophosympatheia/Wizard-Tulu-Dolphin-70B-v1.0), it inherits the best qualities of each.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: minimax-01
  model_provider: minimax
  inference_provider:
    provider: openrouter
    model_name: minimax/minimax-01
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 1.1
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 1000192
  description: |-
    MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens.

    The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model.

    To read more about the release, see: https://www.minimaxi.com/en/news/minimax-01-series-2
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: ministral-3b
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/ministral-3b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.04
    per_output_token: 0.04
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: ministral-8b
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/ministral-8b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1
    per_output_token: 0.1
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-7b-instruct
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-7b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.03
    per_output_token: 0.055
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

    *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-7b-instruct:free
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-7b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

    *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-7b-instruct:nitro
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-7b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.07
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

    *Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.*
  parameters:
    frequency_penalty:
      default: 0
      description: frequency_penalty penalizes the repetition of words based on their frequency in the generated text. A higher frequency penalty discourages the model from repeating words that have already appeared frequently in the output, promoting diversity and reducing repetition.
      max: 2.0
      min: -2.0
      required: false
      type: number
    max_tokens:
      description: Max Tokens (integer) or Max Tokens (null) (Max Tokens).The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      required: false
      type: int
    n:
      default: 1
      description: Number of completions to return for each request, input tokens are only billed once.
      required: false
      type: int
    prediction:
      description: ' Example: {{type: content,content: json_object}} . Enable users to specify expected results, optimizing response times by leveraging known or predictable content. This approach is especially effective for updating text documents or code files with minimal changes, reducing latency while maintaining high-quality result'
      required: false
      type: object
    presence_penalty:
      default: 0
      description: presence_penalty determines how much the model penalizes the repetition of words or phrases. A higher presence penalty encourages the model to use a wider variety of words and phrases, making the output more diverse and creative.
      max: 2.0
      min: -2.0
      required: false
      type: number
    random_seed:
      default: null
      description: The seed to use for random sampling. If set, different calls will generate deterministic results
      max: null
      min: null
      required: false
      type: int
    response_format:
      default: null
      description: 'An object specifying the format that the model must output. Setting to  type: json_object enables JSON mode, which guarantees the message the model generates is in JSON. When using JSON mode you MUST also instruct the model to produce JSON yourself with a system or a user message.'
      required: false
      type: object
    safe_prompt:
      default: false
      description: Whether to inject a safety prompt before all conversations.
      required: false
      type: boolean
    stop:
      default: null
      description: Stop generation if this token is detected. Or if one of these tokens is detected when providing an array
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      description: Temperature (number) or Temperature (null) (Temperature). What sampling temperature to use, we recommend between 0.0 and 0.7. Higher values like 0.7 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both. The default value varies depending on the model you are targeting. Call the /models endpoint to retrieve the appropriate value
      max: 0.7
      min: 0.1
      required: false
      type: number
    top_p:
      default: 1
      description: Nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      type: number
- model: mistral-7b-instruct-v0.1
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-7b-instruct-v0.1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-7b-instruct-v0.3
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-7b-instruct-v0.3
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.03
    per_output_token: 0.055
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

    An improved version of [Mistral 7B Instruct v0.2](/models/mistralai/mistral-7b-instruct-v0.2), with the following changes:

    - Extended vocabulary to 32768
    - Supports v3 Tokenizer
    - Supports function calling

    NOTE: Support for function calling depends on the provider.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-large
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-large
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 6.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    This is Mistral AI's flagship model, Mistral Large 2 (version `mistral-large-2407`). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/).

    It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-large-2407
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-large-2407
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 6.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |
    This is Mistral AI's flagship model, Mistral Large 2 (version mistral-large-2407). It's a proprietary weights-available model and excels at reasoning, code, JSON, chat, and more. Read the launch announcement [here](https://mistral.ai/news/mistral-large-2407/).

    It supports dozens of languages including French, German, Spanish, Italian, Portuguese, Arabic, Hindi, Russian, Chinese, Japanese, and Korean, along with 80+ coding languages including Python, Java, C, C++, JavaScript, and Bash. Its long context window allows precise information recall from large documents.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-large-2411
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-large-2411
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 6.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Mistral Large 2 2411 is an update of [Mistral Large 2](/mistralai/mistral-large) released together with [Pixtral Large 2411](/mistralai/pixtral-large-2411)

    It provides a significant upgrade on the previous [Mistral Large 24.07](/mistralai/mistral-large-2407), with notable improvements in long context understanding, a new system prompt, and more accurate function calling.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-medium
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-medium
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.75
    per_output_token: 8.1
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32000
  description: This is Mistral AI's closed-source, medium-sided model. It's powered by a closed-source prototype and excels at reasoning, code, JSON, chat, and more. In benchmarks, it compares with many of the flagship models of other companies.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-nemo
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-nemo
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.035
    per_output_token: 0.08
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: |-
    A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.

    The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

    It supports function calling and is released under the Apache 2.0 license.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-small
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-small
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32000
  description: With 22 billion parameters, Mistral Small v24.09 offers a convenient mid-point between (Mistral NeMo 12B)[/mistralai/mistral-nemo] and (Mistral Large 2)[/mistralai/mistral-large], providing a cost-effective solution that can be deployed across various platforms and environments. It has better reasoning, exhibits more capabilities, can produce and reason about code, and is multiligual, supporting English, French, German, Italian, and Spanish.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-small-24b-instruct-2501
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-small-24b-instruct-2501
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.14
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.

    The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. [Read the blog post about the model here.](https://mistral.ai/news/mistral-small-3/)
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mistral-tiny
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mistral-tiny
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.25
    per_output_token: 0.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32000
  description: This model is currently powered by Mistral-7B-v0.2, and incorporates a "better" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mixtral-8x22b-instruct
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mixtral-8x22b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.8999999999999999
    per_output_token: 0.8999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 65536
  description: |-
    Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include:
    - strong math, coding, and reasoning
    - large context length (64k)
    - fluency in English, French, Italian, German, and Spanish

    See benchmarks on the launch announcement [here](https://mistral.ai/news/mixtral-8x22b/).
    #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mixtral-8x7b
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mixtral-8x7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.6
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Mixtral 8x7B is a pretrained generative Sparse Mixture of Experts, by Mistral AI. Incorporates 8 experts (feed-forward networks) for a total of 47B parameters. Base model (not fine-tuned for instructions) - see [Mixtral 8x7B Instruct](/models/mistralai/mixtral-8x7b-instruct) for an instruct-tuned model.

    #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mixtral-8x7b-instruct
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mixtral-8x7b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.24
    per_output_token: 0.24
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.

    Instruct model fine-tuned by Mistral. #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mixtral-8x7b-instruct:nitro
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/mixtral-8x7b-instruct:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Mixtral 8x7B Instruct is a pretrained generative Sparse Mixture of Experts, by Mistral AI, for chat and instruction use. Incorporates 8 experts (feed-forward networks) for a total of 47 billion parameters.

    Instruct model fine-tuned by Mistral. #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mn-celeste-12b
  model_provider: nothingiisreal
  inference_provider:
    provider: openrouter
    model_name: nothingiisreal/mn-celeste-12b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    A specialized story writing and roleplaying model based on Mistral's NeMo 12B Instruct. Fine-tuned on curated datasets including Reddit Writing Prompts and Opus Instruct 25K.

    This model excels at creative writing, offering improved NSFW capabilities, with smarter and more active narration. It demonstrates remarkable versatility in both SFW and NSFW scenarios, with strong Out of Character (OOC) steering capabilities, allowing fine-tuned control over narrative direction and character behavior.

    Check out the model's [HuggingFace page](https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9) for details on what parameters and prompts work best!
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mn-inferor-12b
  model_provider: infermatic
  inference_provider:
    provider: openrouter
    model_name: infermatic/mn-inferor-12b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |
    Inferor 12B is a merge of top roleplay models, expert on immersive narratives and storytelling.

    This model was merged using the [Model Stock](https://arxiv.org/abs/2403.19522) merge method using [anthracite-org/magnum-v4-12b](https://openrouter.ai/anthracite-org/magnum-v4-72b) as a base.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mn-starcannon-12b
  model_provider: aetherwiing
  inference_provider:
    provider: openrouter
    model_name: aetherwiing/mn-starcannon-12b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: |-
    Starcannon 12B v2 is a creative roleplay and story writing model, based on Mistral Nemo, using [nothingiisreal/mn-celeste-12b](/nothingiisreal/mn-celeste-12b) as a base, with [intervitens/mini-magnum-12b-v1.1](https://huggingface.co/intervitens/mini-magnum-12b-v1.1) merged in using the [TIES](https://arxiv.org/abs/2306.01708) method.

    Although more similar to Magnum overall, the model remains very creative, with a pleasant writing style. It is recommended for people wanting more variety than Magnum, and yet more verbose prose than Celeste.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mythalion-13b
  model_provider: pygmalionai
  inference_provider:
    provider: openrouter
    model_name: pygmalionai/mythalion-13b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: 'A blend of the new Pygmalion-13b and MythoMax. #merge'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mythomax-l2-13b
  model_provider: gryphe
  inference_provider:
    provider: openrouter
    model_name: gryphe/mythomax-l2-13b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.065
    per_output_token: 0.065
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: 'One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mythomax-l2-13b:extended
  model_provider: gryphe
  inference_provider:
    provider: openrouter
    model_name: gryphe/mythomax-l2-13b:extended
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.125
    per_output_token: 1.125
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: 'One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge'
  parameters: null
- model: mythomax-l2-13b:free
  model_provider: gryphe
  inference_provider:
    provider: openrouter
    model_name: gryphe/mythomax-l2-13b:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: 'One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: mythomax-l2-13b:nitro
  model_provider: gryphe
  inference_provider:
    provider: openrouter
    model_name: gryphe/mythomax-l2-13b:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.2
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: 'One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge'
  parameters: null
- model: noromaid-20b
  model_provider: neversleep
  inference_provider:
    provider: openrouter
    model_name: neversleep/noromaid-20b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.5
    per_output_token: 2.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    A collab between IkariDev and Undi. This merge is suitable for RP, ERP, and general knowledge.

    #merge #uncensored
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: nous-hermes-2-mixtral-8x7b-dpo
  model_provider: nousresearch
  inference_provider:
    provider: openrouter
    model_name: nousresearch/nous-hermes-2-mixtral-8x7b-dpo
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.6
    per_output_token: 0.6
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Nous Hermes 2 Mixtral 8x7B DPO is the new flagship Nous Research model trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b).

    The model was trained on over 1,000,000 entries of primarily [GPT-4](/models/openai/gpt-4) generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks.

    #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: nous-hermes-llama2-13b
  model_provider: nousresearch
  inference_provider:
    provider: openrouter
    model_name: nousresearch/nous-hermes-llama2-13b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.16999999999999998
    per_output_token: 0.16999999999999998
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: A state-of-the-art language model fine-tuned on over 300k instructions by Nous Research, with Teknium and Emozilla leading the fine tuning process.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: nova-lite-v1
  model_provider: amazon
  inference_provider:
    provider: openrouter
    model_name: amazon/nova-lite-v1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.06
    per_output_token: 0.24
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 300000
  description: |-
    Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite can handle real-time customer interactions, document analysis, and visual question-answering tasks with high accuracy.

    With an input context of 300K tokens, it can analyze multiple images or up to 30 minutes of video in a single input.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: nova-micro-v1
  model_provider: amazon
  inference_provider:
    provider: openrouter
    model_name: amazon/nova-micro-v1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.035
    per_output_token: 0.14
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: Amazon Nova Micro 1.0 is a text-only model that delivers the lowest latency responses in the Amazon Nova family of models at a very low cost. With a context length of 128K tokens and optimized for speed and cost, Amazon Nova Micro excels at tasks such as text summarization, translation, content classification, interactive chat, and brainstorming. It has  simple mathematical reasoning and coding abilities.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: nova-pro-v1
  model_provider: amazon
  inference_provider:
    provider: openrouter
    model_name: amazon/nova-pro-v1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 3.1999999999999997
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 300000
  description: |-
    Amazon Nova Pro 1.0 is a capable multimodal model from Amazon focused on providing a combination of accuracy, speed, and cost for a wide range of tasks. As of December 2024, it achieves state-of-the-art performance on key benchmarks including visual question answering (TextVQA) and video understanding (VATEX).

    Amazon Nova Pro demonstrates strong capabilities in processing both visual and textual information and at analyzing financial documents.

    **NOTE**: Video input is not supported at this time.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: o1
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/o1
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 15.0
    per_output_token: 60.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: "The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. \n\nThe o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).\n"
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
- model: o1-mini
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/o1-mini
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.1
    per_output_token: 4.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.

    The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

    Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
- model: o1-mini-2024-09-12
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/o1-mini-2024-09-12
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.1
    per_output_token: 4.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.

    The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

    Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
- model: o1-preview
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/o1-preview
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 15.0
    per_output_token: 60.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.

    The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

    Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
- model: o1-preview-2024-09-12
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/o1-preview-2024-09-12
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 15.0
    per_output_token: 60.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.

    The o1 models are optimized for math, science, programming, and other STEM-related tasks. They consistently exhibit PhD-level accuracy on benchmarks in physics, chemistry, and biology. Learn more in the [launch announcement](https://openai.com/o1).

    Note: This model is currently experimental and not suitable for production use-cases, and may be heavily rate-limited.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
- model: o3-mini
  model_provider: openai
  inference_provider:
    provider: openrouter
    model_name: openai/o3-mini
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.1
    per_output_token: 4.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 200000
  description: |-
    OpenAI o3-mini is a cost-efficient language model optimized for STEM reasoning tasks, particularly excelling in science, mathematics, and coding. The model features three adjustable reasoning effort levels and supports key developer capabilities including function calling, structured outputs, and streaming, though it does not include vision processing capabilities.

    The model demonstrates significant improvements over its predecessor, with expert testers preferring its responses 56% of the time and noting a 39% reduction in major errors on complex questions. With medium reasoning effort settings, o3-mini matches the performance of the larger o1 model on challenging reasoning evaluations like AIME and GPQA, while maintaining lower latency and cost.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
- model: openchat-7b
  model_provider: openchat
  inference_provider:
    provider: openrouter
    model_name: openchat/openchat-7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.055
    per_output_token: 0.055
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.

    - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/models/openchat/openchat-7b).
    - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/models/openchat/openchat-8b).

    #open-source
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: openchat-7b:free
  model_provider: openchat
  inference_provider:
    provider: openrouter
    model_name: openchat/openchat-7b:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    OpenChat 7B is a library of open-source language models, fine-tuned with "C-RLFT (Conditioned Reinforcement Learning Fine-Tuning)" - a strategy inspired by offline reinforcement learning. It has been trained on mixed-quality data without preference labels.

    - For OpenChat fine-tuned on Mistral 7B, check out [OpenChat 7B](/models/openchat/openchat-7b).
    - For OpenChat fine-tuned on Llama 8B, check out [OpenChat 8B](/models/openchat/openchat-8b).

    #open-source
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: openhermes-2.5-mistral-7b
  model_provider: teknium
  inference_provider:
    provider: openrouter
    model_name: teknium/openhermes-2.5-mistral-7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.16999999999999998
    per_output_token: 0.16999999999999998
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    A continuation of [OpenHermes 2 model](/models/teknium/openhermes-2-mistral-7b), trained on additional code datasets.
    Potentially the most interesting finding from training on a good ratio (est. of around 7-14% of the total dataset) of code instruction was that it has boosted several non-code benchmarks, including TruthfulQA, AGIEval, and GPT4All suite. It did however reduce BigBench benchmark score, but the net gain overall is significant.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: palm-2-chat-bison
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/palm-2-chat-bison
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 9216
  description: PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: palm-2-chat-bison-32k
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/palm-2-chat-bison-32k
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: palm-2-codechat-bison
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/palm-2-codechat-bison
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 7168
  description: PaLM 2 fine-tuned for chatbot conversations that help with code-related questions.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: palm-2-codechat-bison-32k
  model_provider: google
  inference_provider:
    provider: openrouter
    model_name: google/palm-2-codechat-bison-32k
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 2.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: PaLM 2 fine-tuned for chatbot conversations that help with code-related questions.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-3.5-mini-128k-instruct
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/phi-3.5-mini-128k-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1
    per_output_token: 0.1
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Phi-3.5 models are lightweight, state-of-the-art open models. These models were trained with Phi-3 datasets that include both synthetic data and the filtered, publicly available websites data, with a focus on high quality and reasoning-dense properties. Phi-3.5 Mini uses 3.8B parameters, and is a dense decoder-only transformer model using the same tokenizer as [Phi-3 Mini](/models/microsoft/phi-3-mini-128k-instruct).

    The models underwent a rigorous enhancement process, incorporating both supervised fine-tuning, proximal policy optimization, and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3.5 models showcased robust and state-of-the-art performance among models with less than 13 billion parameters.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-3-medium-128k-instruct
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/phi-3-medium-128k-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 1.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

    At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance.

    For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-3-medium-128k-instruct:free
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/phi-3-medium-128k-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Phi-3 128K Medium is a powerful 14-billion parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

    At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. In the MMLU-Pro eval, the model even comes close to a Llama3 70B level of performance.

    For 4k context length, try [Phi-3 Medium 4K](/models/microsoft/phi-3-medium-4k-instruct).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-3-mini-128k-instruct
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/phi-3-mini-128k-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1
    per_output_token: 0.1
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

    At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-3-mini-128k-instruct:free
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/phi-3-mini-128k-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Phi-3 Mini is a powerful 3.8B parameter model designed for advanced language understanding, reasoning, and instruction following. Optimized through supervised fine-tuning and preference adjustments, it excels in tasks involving common sense, mathematics, logical reasoning, and code processing.

    At time of release, Phi-3 Medium demonstrated state-of-the-art performance among lightweight models. This model is static, trained on an offline dataset with an October 2023 cutoff date.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: phi-4
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/phi-4
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.14
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16384
  description: "[Microsoft Research](/microsoft) Phi-4 is designed to perform well in complex reasoning tasks and can operate efficiently in situations with limited memory or where quick responses are needed. \n\nAt 14 billion parameters, it was trained on a mix of high-quality synthetic datasets, data from curated websites, and academic materials. It has undergone careful improvement to follow instructions accurately and maintain strong safety standards. It works best with English language inputs.\n\nFor more information, please see [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)\n"
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: pixtral-12b
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/pixtral-12b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1
    per_output_token: 0.1
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: 'The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836.'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: pixtral-large-2411
  model_provider: mistralai
  inference_provider:
    provider: openrouter
    model_name: mistralai/pixtral-large-2411
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 2.0
    per_output_token: 6.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 128000
  description: |+
    Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images.

    The model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.

  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qvq-72b-preview
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qvq-72b-preview
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.25
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: |-
    QVQ-72B-Preview is an experimental research model developed by the [Qwen](/qwen) team, focusing on enhancing visual reasoning capabilities.

    ## Performance

    |                | **QVQ-72B-Preview** | o1-2024-12-17 | gpt-4o-2024-05-13 | Claude3.5 Sonnet-20241022 | Qwen2VL-72B |
    |----------------|-----------------|---------------|-------------------|----------------------------|-------------|
    | MMMU(val)      | 70.3            | 77.3          | 69.1              | 70.4                       | 64.5        |
    | MathVista(mini) | 71.4            | 71.0          | 63.8              | 65.3                       | 70.5        |
    | MathVision(full)   | 35.9            | –             | 30.4              | 35.6                       | 25.9        |
    | OlympiadBench  | 20.4            | –             | 25.9              | –                          | 11.2        |


    ## Limitations

    1. **Language Mixing and Code-Switching:** The model might occasionally mix different languages or unexpectedly switch between them, potentially affecting the clarity of its responses.
    2. **Recursive Reasoning Loops:**  There's a risk of the model getting caught in recursive reasoning loops, leading to lengthy responses that may not even arrive at a final answer.
    3. **Safety and Ethical Considerations:** Robust safety measures are needed to ensure reliable and safe performance. Users should exercise caution when deploying this model.
    4. **Performance and Benchmark Limitations:** Despite the improvements in visual reasoning, QVQ doesn’t entirely replace the capabilities of [Qwen2-VL-72B](/qwen/qwen-2-vl-72b-instruct). During multi-step visual reasoning, the model might gradually lose focus on the image content, leading to hallucinations. Moreover, QVQ doesn’t show significant improvement over [Qwen2-VL-72B](/qwen/qwen-2-vl-72b-instruct) in basic recognition tasks like identifying people, animals, or plants.

    Note: Currently, the model only supports single-round dialogues and image outputs. It does not support video inputs.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2.5-72b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2.5-72b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.13
    per_output_token: 0.4
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 128000
  description: |-
    Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:

    - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.

    - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.

    - Long-context Support up to 128K tokens and can generate up to 8K tokens.

    - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2.5-7b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2.5-7b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.025
    per_output_token: 0.05
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:

    - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.

    - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.

    - Long-context Support up to 128K tokens and can generate up to 8K tokens.

    - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2.5-coder-32b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2.5-coder-32b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.16
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 33000
  description: "Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:\n\n- Significantly improvements in **code generation**, **code reasoning** and **code fixing**. \n- A more comprehensive foundation for real-world applications such as **Code Agents**. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.\n\nTo read more about its evaluation results, check out [Qwen 2.5 Coder's blog](https://qwenlm.github.io/blog/qwen2.5-coder-family/)."
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen2.5-vl-72b-instruct:free
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen2.5-vl-72b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 131072
  description: Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2-72b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2-72b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.8999999999999999
    per_output_token: 0.8999999999999999
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Qwen2 72B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

    It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.

    For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2-7b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2-7b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.054
    per_output_token: 0.054
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

    It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.

    For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2-7b-instruct:free
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2-7b-instruct:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: |-
    Qwen2 7B is a transformer-based model that excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

    It features SwiGLU activation, attention QKV bias, and group query attention. It is pretrained on extensive data with supervised finetuning and direct preference optimization.

    For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2/) and [GitHub repo](https://github.com/QwenLM/Qwen2).

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2-vl-72b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2-vl-72b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.4
    per_output_token: 0.4
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    Qwen2 VL 72B is a multimodal LLM from the Qwen Team with the following key enhancements:

    - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.

    - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.

    - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.

    - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

    For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL).

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-2-vl-7b-instruct
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-2-vl-7b-instruct
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.1
    per_output_token: 0.1
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    Qwen2 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements:

    - SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.

    - Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.

    - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.

    - Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

    For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL).

    Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-max
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-max
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.6
    per_output_token: 6.3999999999999995
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 32768
  description: Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. The parameter count is unknown.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-plus
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-plus
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.4
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 131072
  description: Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-turbo
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-turbo
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.05
    per_output_token: 0.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities:
  - tools
  type: completions
  limits:
    max_context_size: 1000000
  description: Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    tool_choice:
      default: none
      description: Controls which (if any) tool is called by the model. Accepted values include 'none' (no tool call), 'auto' (model decides), 'required' (must call a tool), or a specific tool identifier/object.
      required: false
      type: string
    tools:
      default: []
      description: A list of tools available for or used during the generation process. This follows a specific tool-calling schema.
      required: false
      type: array
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwen-vl-plus:free
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwen-vl-plus:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  - image
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 7500
  description: |
    Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: qwq-32b-preview
  model_provider: qwen
  inference_provider:
    provider: openrouter
    model_name: qwen/qwq-32b-preview
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.12
    per_output_token: 0.18
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |+
    QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having several important limitations:

    1. **Language Mixing and Code-Switching**: The model may mix languages or switch between them unexpectedly, affecting response clarity.
    2. **Recursive Reasoning Loops**: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.
    3. **Safety and Ethical Considerations**: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.
    4. **Performance and Benchmark Limitations**: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: remm-slerp-l2-13b
  model_provider: undi95
  inference_provider:
    provider: openrouter
    model_name: undi95/remm-slerp-l2-13b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.7999999999999999
    per_output_token: 1.2
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: 'A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge'
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: remm-slerp-l2-13b:extended
  model_provider: undi95
  inference_provider:
    provider: openrouter
    model_name: undi95/remm-slerp-l2-13b:extended
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.125
    per_output_token: 1.125
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 6144
  description: 'A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge'
  parameters: null
- model: rocinante-12b
  model_provider: thedrummer
  inference_provider:
    provider: openrouter
    model_name: thedrummer/rocinante-12b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.25
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    Rocinante 12B is designed for engaging storytelling and rich prose.

    Early testers have reported:
    - Expanded vocabulary with unique and expressive word choices
    - Enhanced creativity for vivid narratives
    - Adventure-filled and captivating stories
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: rogue-rose-103b-v0.2:free
  model_provider: sophosympatheia
  inference_provider:
    provider: openrouter
    model_name: sophosympatheia/rogue-rose-103b-v0.2:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |
    Rogue Rose demonstrates strong capabilities in roleplaying and storytelling applications, potentially surpassing other models in the 103-120B parameter range. While it occasionally exhibits inconsistencies with scene logic, the overall interaction quality represents an advancement in natural language processing for creative applications.

    It is a 120-layer frankenmerge model combining two custom 70B architectures from November 2023, derived from the [xwin-stellarbright-erp-70b-v2](https://huggingface.co/sophosympatheia/xwin-stellarbright-erp-70b-v2) base.
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: sonar
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/sonar
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 1.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 127072
  description: Sonar is lightweight, affordable, fast, and simple to use — now featuring citations and the ability to customize sources. It is designed for companies seeking to integrate lightweight question-and-answer features optimized for speed.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: sonar-reasoning
  model_provider: perplexity
  inference_provider:
    provider: openrouter
    model_name: perplexity/sonar-reasoning
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.0
    per_output_token: 5.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 127000
  description: "Sonar Reasoning is a reasoning model provided by Perplexity based on [DeepSeek R1](/deepseek/deepseek-r1).\n\nIt allows developers to utilize long chain of thought with built-in web search. Sonar Reasoning is uncensored and hosted in US datacenters. "
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    include_reasoning:
      default: false
      description: If the endpoint can return reasoning explicitly, setting this parameter will include reasoning tokens in the response (available in a separate field).
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: sorcererlm-8x22b
  model_provider: raifle
  inference_provider:
    provider: openrouter
    model_name: raifle/sorcererlm-8x22b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 4.5
    per_output_token: 4.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 16000
  description: |-
    SorcererLM is an advanced RP and storytelling model, built as a Low-rank 16-bit LoRA fine-tuned on [WizardLM-2 8x22B](/microsoft/wizardlm-2-8x22b).

    - Advanced reasoning and emotional intelligence for engaging and immersive interactions
    - Vivid writing capabilities enriched with spatial and contextual awareness
    - Enhanced narrative depth, promoting creative and dynamic storytelling
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: toppy-m-7b
  model_provider: undi95
  inference_provider:
    provider: openrouter
    model_name: undi95/toppy-m-7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.07
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
    List of merged models:
    - NousResearch/Nous-Capybara-7B-V1.9
    - [HuggingFaceH4/zephyr-7b-beta](/models/huggingfaceh4/zephyr-7b-beta)
    - lemonilia/AshhLimaRP-Mistral-7B
    - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
    - Undi95/Mistral-pippa-sharegpt-7b-qlora

    #merge #uncensored
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: toppy-m-7b:free
  model_provider: undi95
  inference_provider:
    provider: openrouter
    model_name: undi95/toppy-m-7b:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
    List of merged models:
    - NousResearch/Nous-Capybara-7B-V1.9
    - [HuggingFaceH4/zephyr-7b-beta](/models/huggingfaceh4/zephyr-7b-beta)
    - lemonilia/AshhLimaRP-Mistral-7B
    - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
    - Undi95/Mistral-pippa-sharegpt-7b-qlora

    #merge #uncensored
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: toppy-m-7b:nitro
  model_provider: undi95
  inference_provider:
    provider: openrouter
    model_name: undi95/toppy-m-7b:nitro
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.07
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: |-
    A wild 7B parameter model that merges several models using the new task_arithmetic merge method from mergekit.
    List of merged models:
    - NousResearch/Nous-Capybara-7B-V1.9
    - [HuggingFaceH4/zephyr-7b-beta](/models/huggingfaceh4/zephyr-7b-beta)
    - lemonilia/AshhLimaRP-Mistral-7B
    - Vulkane/120-Days-of-Sodom-LoRA-Mistral-7b
    - Undi95/Mistral-pippa-sharegpt-7b-qlora

    #merge #uncensored
  parameters: null
- model: unslopnemo-12b
  model_provider: thedrummer
  inference_provider:
    provider: openrouter
    model_name: thedrummer/unslopnemo-12b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: UnslopNemo v4.1 is the latest addition from the creator of Rocinante, designed for adventure writing and role-play scenarios.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: weaver
  model_provider: mancer
  inference_provider:
    provider: openrouter
    model_name: mancer/weaver
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 1.5
    per_output_token: 2.25
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8000
  description: An attempt to recreate Claude-style verbosity, but don't expect the same level of coherence or memory. Meant for use in roleplay/narrative situations.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: wizardlm-2-7b
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/wizardlm-2-7b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.07
    per_output_token: 0.07
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32000
  description: |-
    WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger opensource leading models

    It is a finetune of [Mistral 7B Instruct](/models/mistralai/mistral-7b-instruct), using the same technique as [WizardLM-2 8x22B](/models/microsoft/wizardlm-2-8x22b).

    To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/).

    #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: wizardlm-2-8x22b
  model_provider: microsoft
  inference_provider:
    provider: openrouter
    model_name: microsoft/wizardlm-2-8x22b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.5
    per_output_token: 0.5
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 65536
  description: |-
    WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

    It is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b).

    To read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/).

    #moe
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: xwin-lm-70b
  model_provider: xwin-lm
  inference_provider:
    provider: openrouter
    model_name: xwin-lm/xwin-lm-70b
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.75
    per_output_token: 3.75
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 8192
  description: Xwin-LM aims to develop and open-source alignment tech for LLMs. Our first release, built-upon on the [Llama2](/models/${Model.Llama_2_13B_Chat}) base models, ranked TOP-1 on AlpacaEval. Notably, it's the first to surpass [GPT-4](/models/${Model.GPT_4}) on this benchmark. The project will be continuously updated.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    min_p:
      default: 0.0
      description: Represents the minimum probability for a token to be considered, relative to the most likely token. For example, a value of 0.1 means only tokens with at least 10% of the top token’s probability are allowed.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    seed:
      default: null
      description: If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
      max: null
      min: null
      required: false
      step: 1
      type: int
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_a:
      default: 0.0
      description: Consider only tokens with sufficiently high probabilities relative to the top token. A lower value focuses the selection on tokens near the top probability, acting like a dynamic Top-P filter.
      max: 1.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: yi-large
  model_provider: 01-ai
  inference_provider:
    provider: openrouter
    model_name: 01-ai/yi-large
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 3.0
    per_output_token: 3.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 32768
  description: |-
    The Yi Large model was designed by 01.AI with the following usecases in mind: knowledge search, data classification, human-like chat bots, and customer service.

    It stands out for its multilingual proficiency, particularly in Spanish, Chinese, Japanese, German, and French.

    Check out the [launch announcement](https://01-ai.github.io/blog/01.ai-yi-large-llm-launch) to learn more.
  parameters:
    frequency_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
      max: 2
      min: -2
      required: false
      step: 0.1
      type: float
    logit_bias:
      default: {}
      description: A JSON object mapping token IDs to bias values. These biases (typically between -100 and 100) are added to the logits before sampling, affecting token selection.
      required: false
      type: object
    logprobs:
      default: false
      description: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned.
      required: false
      type: boolean
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    presence_penalty:
      default: 0
      description: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
      max: 1.999
      min: -2
      required: false
      step: 0.1
      type: float
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    response_format:
      default:
        type: json_object
      description: 'Forces the model to produce output in a specific format. For example, setting this to { ''type'': ''json_object'' } enables JSON mode, ensuring the response is valid JSON.'
      required: false
      type: object
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    structured_outputs:
      default: false
      description: If true, instructs the model to return structured outputs (e.g., in JSON format) using the response_format provided.
      required: false
      type: boolean
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_logprobs:
      default: null
      description: Specifies the number of most likely tokens (from 0 to 20) to return at each token position, each with its associated log probability. (Requires that logprobs is enabled.)
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float
- model: zephyr-7b-beta:free
  model_provider: huggingfaceh4
  inference_provider:
    provider: openrouter
    model_name: huggingfaceh4/zephyr-7b-beta:free
    endpoint: https://openrouter.ai/api/v1
  price:
    per_input_token: 0.0
    per_output_token: 0.0
    valid_from: null
  input_formats:
  - text
  output_formats:
  - text
  capabilities: []
  type: completions
  limits:
    max_context_size: 4096
  description: Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](/models/mistralai/mistral-7b-instruct-v0.1) that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).
  parameters:
    max_tokens:
      default: 1000
      description: The maximum number of tokens that can be generated in the completion. The token count of your prompt plus max_tokens cannot exceed the model's context length.
      max: null
      min: null
      required: false
      type: int
    repetition_penalty:
      default: 1.0
      description: Helps reduce repetition in the output. Higher values (up to 2.0) make the model less likely to repeat tokens, whereas values closer to 0.0 encourage token reuse.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    stop:
      default: null
      description: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
      max: null
      min: null
      required: false
      type: string/array
    temperature:
      default: 1.0
      description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.
      max: 2.0
      min: 0.0
      required: false
      step: 0.1
      type: float
    top_k:
      default: 0
      description: Limits the token sampling to only the top K tokens. A value of 0 disables this setting, allowing the model to consider all tokens.
      min: 0
      required: false
      step: 1
      type: int
    top_p:
      default: 1
      description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or temperature but not both.
      max: 1
      min: 0
      required: false
      step: 0.05
      type: float