This page documents the LLM provider-specific configuration options for the generative_models object in config.yaml.
Individual models are customized in the generative_models section of the configuration, for example:
azure_oai:
... # Global provider config
# azure_openai Provider Example
azure_gpt_4o_mini:
# Common configuration keys
provider: azure_openai # Client type to use (required)
model_name: gpt-4o-mini # Name of the model to use (required)
temperature: 0.0 # LLM temperature setting (optional)
max_tokens: 2048 # Max output tokens (optional)
system_prompt: null # Custom system prompt (optional)
# Provider-specific configurations
deployment_name: "gpt-35-turbo" # Required for Azure OpenAI models
api_version: "2024-06-01" # Optional?
additional_kwargs: # Add additional parameters to OpenAI request body
user: syftr
# Cost example - options are the same for all models (required)
cost:
type: tokens # tokens, characters, or hourly
input: 1.00 # Cost in USD per million
output: 2.00 # Cost in USD per million
# rate: 12.00 # Average cost per hour of inference server, when type is hourlySome configuration options are common across all LLM providers.
provider: (String, Required) Must be one of the supported provider names (openai_like,azure_openai,azure_ai,vertex_ai,anthropic_vertex,cerebras).model_name: (String, Required) The name of the model to use, which should match the model name required by the provider's API.temperature: (Float, Optional) The temperature setting to use for inference. Defaults to 0.max_tokens: (Integer, Optional) The maximum number of output tokens to produce. Defaults to 2048.system_prompt: (String, Optional) A custom system prompt to use for all completions. Defaults to null
The cost dictionary is also common across all LLM providers.
type: (String, Required) The cost model type;tokens,characters, orhourly.input: (Float) Required iftypeistokensorcharacters. Cost per million input tokens.output: (Float) Required iftypeistokensorcharacters. Cost per million output tokens.rate: (Float) Required iftypeishourly. Average cost per hour of inference server.
There are no global settings for openai_like models, which each have their own endpoints and credentials.
They are configured as follows:
provider: (String, Literal) Must beopenai_like(for OpenAI-compatible APIs, including self-hosted models via vLLM, TGI, etc.).api_base: (String, HttpUrl, Required) The base URL of the OpenAI-compatible API endpoint (e.g., "http://localhost:8000/v1").api_key: (String, SecretStr, Required) The API key for authenticating with the model's endpoint. Can be placed in a file inruntime-secrets/generative_models__{your_model_key}__api_key.api_version: (String, Optional) The API version string, if required by the compatible API. Defaults toNone.context_window: (Integer, Optional) The maximum number of input tokens. Defaults to3900.timeout: (Integer, Optional) Timeout in seconds for API requests. Defaults to120.additional_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the client. Defaults to an empty dictionary ({}).is_chat_model: (Boolean, Optional) Whether the model supports multi-turn chat in addition to single completion requests. Defaults toTrue.is_function_calling_model: (Boolean, Optional) Whether the model supports function calling. Defaults toFalse.
Here is an example using Together.ai
generative_models:
together-r1:
provider: openai_like
model_name: "deepseek-ai/DeepSeek-R1"
max_tokens: 5000
api_base: "https://api.together.xyz/v1"
# api_key: <your API key> # or put a file at runtime-secrets/generative_models__togther_r1__api_key
context_window: 16384
cost:
type: tokens
input: 7.00
output: 7.00This is similar to openai_like, but for models using the openai_responses API.
There are no global settings for openai_responses models, which each have their own endpoints and credentials.
They are configured as follows:
provider: (String, Literal) Must beopenai_responses(for OpenAI Responses-compatible APIs, including self-hosted models via vLLM, TGI, etc.).api_base: (String, HttpUrl, Required) The base URL of the OpenAI-compatible API endpoint (e.g., "http://localhost:8000/v1").api_key: (String, SecretStr, Required) The API key for authenticating with the model's endpoint. Can be placed in a file inruntime-secrets/generative_models__{your_model_key}__api_key.api_version: (String, Optional) The API version string, if required by the compatible API. Defaults toNone.context_window: (Integer, Optional) The maximum number of input tokens. Defaults to3900.timeout: (Integer, Optional) Timeout in seconds for API requests. Defaults to120.additional_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the client. Defaults to an empty dictionary ({}).is_chat_model: (Boolean, Optional) Whether the model supports multi-turn chat in addition to single completion requests. Defaults toTrue.is_function_calling_model: (Boolean, Optional) Whether the model supports function calling. Defaults toFalse.
Here is an example using a self-hosted instance of gpt-oss-120b:
generative_models:
gpt-oss-120b-low:
provider: openai_responses
model_name: openai/gpt-oss-120b
api_base: "http://gpt-oss-host:8000/v1"
api_key: asdf
max_tokens: 2000
context_window: 126000
additional_kwargs:
reasoning:
effort: low
cost:
type: tokens
input: 1.80
output: 1.80The top-level azure_oai config object is used to set the api_url and api_key:
azure_oai:
api_url: "https://my-azure-endpoint.openai.azure.com/"
api_key: "<your-api-key>"
api_version: "2024-07-18" # Default valueIndividual models are further customized by the deployment name and, optionally, the API version to use:
provider: (String, Literal) Must beazure_openai.deployment_name: (String, Required) The name of your deployment in Azure OpenAI.api_url: (String, HttpUrl, Optional) Overridesazure_oai.api_url.api_key: (String, Optional) Overridesazure_oai.api_key.api_version: (String, Optional) Overridesazure_oai.api_version.additional_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the API. Defaults to an empty dictionary ({}).
There are no common or global settings for azure_ai models, which each have their own endpoints and credentials.
They are configured as follows:
provider: (String, Literal) Must beazure_ai(for Azure AI Completions, e.g., catalog models).api_url: (String, HttpUrl, Required) The API URL endpoint for this specific model deployment.api_key: (String, SecretStr, Required) The API key for authenticating with the model's endpoint. Can be placed in a file inruntime-secrets/generative_models__{your_model_key}__api_key.api_version: (String, Optional) API version string to set in requests.client_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the client.
The top-level gcp_vertex config object is used to set the default project_id, region, and credentials:
gcp_vertex:
project_id: "<your-project-id>"
region: "europe-west1"
credentials: > # Can also put GCP credentials in a file named runtime-secrets/gcp_vertex__credentials
{...}Individual models are further customized by the following:
provider: (String, Literal) Must bevertex_ai.context_window: (Integer, Optional) The maximum number of input tokens. Defaults to4096.additional_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the Vertex API. Defaults to an empty dictionary ({}).safety_settings: (Object, Optional) A dictionary defining content safety settings. Defaults to predefinedGCP_SAFETY_SETTINGS(maximally permissive - seeconfiguration.py).project_id: (String, Optional) The GCP Project ID. If not provided (None), it will use the globalcfg.gcp_vertex.project_id. Defaults toNone.region: (String, Optional) The GCP Region. If not provided (None), it will use the globalcfg.gcp_vertex.region. Defaults toNone.
anthropic_vertex is used for Anthropic models hosted in Vertex AI. The top-level gcp_vertex object is used to provide the default values for project_id, region, and credentials.
Individual models are further customized by the following:
provider: (String, Literal) Must beanthropic_vertex.project_id: (String, Optional) The GCP Project ID. If not provided (None), it will use the globalcfg.gcp_vertex.project_id. Defaults toNone.region: (String, Optional) The GCP Region. If not provided (None), it will use the globalcfg.gcp_vertex.region. Defaults toNone.thinking_dict: (Object, Optional) Configure thinking controls for the LLM. See the Anthropic API docs for more details. For example:
thinking_dict:
type: enabled
budget_tokens: 16000additional_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the API. Defaults to an empty dictionary ({}).
The top-level cerebras config object is used to set the api_url and api_key:
cerebras:
api_url: "https://api.cerebras.ai/v1" # Default value
api_key: "<your-api-key>"Individual models are further customized by the following:
provider: (String, Literal) Must becerebras.context_window: (Integer, Optional) The maximum number of input tokens. Defaults to3900.is_chat_model: (Boolean, Optional) Whether the model supports multi-turn chat in addition to single completion requests.is_function_calling_model: (Boolean, Optional) Whether the model supports function calling.additional_kwargs: (Object, Optional) A dictionary of additional keyword arguments to pass to the Cerebras client. Defaults to an empty dictionary ({}).