currently there is no way to set maximum output tokens, and the default value is not documented.
why:
it is important in certain scenarios to limit the output token budget to prevent consuming more tokens than necessary.
an example of an issue i ran into, the model kept going in circles for a complex reasoning task, wasted ~80K output tokens without being able to solve the task.
with output token limit, one can tweak the limit per turn before asking the model to solve complex problems, analyze model CoT and response, increase limit as needed.
how:
could be slash command, or launching webui to set the limit.
currently there is no way to set maximum output tokens, and the default value is not documented.
why:
it is important in certain scenarios to limit the output token budget to prevent consuming more tokens than necessary.
an example of an issue i ran into, the model kept going in circles for a complex reasoning task, wasted ~80K output tokens without being able to solve the task.
with output token limit, one can tweak the limit per turn before asking the model to solve complex problems, analyze model CoT and response, increase limit as needed.
how:
could be slash command, or launching webui to set the limit.