fix(vlm): add max_tokens parameter to VLM completion calls to prevent vLLM rejection by mvanhorn · Pull Request #689 · volcengine/OpenViking

mvanhorn · 2026-03-17T06:59:53Z

Summary

VLM completion calls in all three backends (OpenAI, VolcEngine, LiteLLM) do not pass max_tokens to the API. This causes two failures:

vLLM rejects the request - without max_tokens, vLLM allocates all context space to input and assigns 0 output tokens, returning: You passed 65537 input tokens and requested 0 output tokens.
No output budget - even when prompts fit, the model has no guaranteed output space, leading to truncated or empty responses.

This is separate from #529 (prompt budget guard, fixed in #683). That PR addresses prompt assembly size. This PR addresses the missing max_tokens in the API calls themselves, which affects all VLM usage.

Changes

VLMConfig: added max_tokens field with default 4096
VLMBase.__init__(): reads max_tokens from config dict
openai_vlm.py: all 4 completion methods pass max_tokens when set
volcengine_vlm.py: all 4 completion methods pass max_tokens when set
litellm_vlm.py: _build_kwargs() passes max_tokens when set
_build_vlm_config_dict(): includes max_tokens in the config dict

Config example:

{
  "vlm": {
    "model": "gpt-4o-mini",
    "api_key": "...",
    "max_tokens": 4096
  }
}

Fixes #674

This contribution was developed with AI assistance (Claude Code).

Test plan

Existing VLM tests pass (no breaking change - default is 4096)
Deploy with vLLM backend and verify large-directory overview generation succeeds
Set max_tokens: null in config to disable the limit
Verify ruff format --check and ruff check pass

… vLLM rejection Without max_tokens, vLLM allocates all context space to input tokens and assigns 0 output tokens, rejecting requests with "You passed N input tokens and requested 0 output tokens." Even when prompts fit, the model has no guaranteed output space, leading to truncated or empty responses. This adds max_tokens support across all VLM backends: - VLMConfig: new max_tokens field (default 4096) - VLMBase: reads max_tokens from config dict - OpenAI, VolcEngine, LiteLLM backends: pass max_tokens in API calls - Conditional inclusion (if self.max_tokens) so None disables the limit Fixes volcengine#674 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

qin-ctx

Overall the fix is clean and well-scoped. One design concern about the default value, and a minor code robustness suggestion.

qin-ctx · 2026-03-17T08:01:48Z

openviking_cli/utils/config/vlm_config.py


+    max_tokens: Optional[int] = Field(
+        default=4096, description="Maximum tokens for VLM completion output"
+    )


[Design] (blocking)

The default of 4096 changes behavior for all existing users, not just vLLM users. For OpenAI / VolcEngine native APIs, omitting max_tokens lets the server choose its own (typically generous) default. Forcing 4096 could silently truncate outputs that previously worked fine.

Suggestion: default to None so that max_tokens is only sent when the user explicitly configures it. vLLM users (who need this fix) can add "max_tokens": 4096 to their config; everyone else remains unaffected.

max_tokens: Optional[int] = Field( default=None, description="Maximum tokens for VLM completion output" )

You could also call this out more prominently in the config example in the PR description or docs, so vLLM users know to set it.

qin-ctx · 2026-03-17T08:01:48Z

openviking/models/vlm/backends/openai_vlm.py

            "messages": [{"role": "user", "content": prompt}],
            "temperature": self.temperature,
        }
+        if self.max_tokens:


[Suggestion] (non-blocking)

Using if self.max_tokens treats 0 the same as None (both falsy). While max_tokens=0 is never a valid API value, if self.max_tokens is not None is semantically clearer and avoids any edge-case surprises. Same applies to all other backends.

if self.max_tokens is not None: kwargs["max_tokens"] = self.max_tokens

Change default from 4096 to None so max_tokens is only sent when explicitly configured. Prevents silently truncating outputs on OpenAI/VolcEngine where omitting max_tokens lets the server choose. Also use `is not None` instead of truthiness for max_tokens guards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mvanhorn · 2026-03-17T14:23:38Z

Fixed in 605e954: changed max_tokens default from 4096 to None. Now only sent when explicitly configured, so OpenAI/VolcEngine native APIs use their own defaults. Also switched all guards to is not None per your suggestion.

mvanhorn · 2026-03-18T15:29:18Z

Thanks for the fast turnaround on review and merge.

github-project-automation bot moved this to Backlog in OpenViking project Mar 17, 2026

github-project-automation bot added this to OpenViking project Mar 17, 2026

mvanhorn mentioned this pull request Mar 17, 2026

[Bug]: VLM 调用溢出：_generate_overview() 缺少 prompt 截断和 max_tokens 参数 #674

Closed

qin-ctx requested changes Mar 17, 2026

View reviewed changes

qin-ctx approved these changes Mar 18, 2026

View reviewed changes

qin-ctx merged commit 985c60a into volcengine:main Mar 18, 2026
5 of 6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 18, 2026

mvanhorn mentioned this pull request Mar 18, 2026

fix(mcp): add api_key support and configurable defaults to MCP query server #691

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vlm): add max_tokens parameter to VLM completion calls to prevent vLLM rejection#689

fix(vlm): add max_tokens parameter to VLM completion calls to prevent vLLM rejection#689
qin-ctx merged 2 commits intovolcengine:mainfrom
mvanhorn:osc/674-vlm-max-tokens

mvanhorn commented Mar 17, 2026

Uh oh!

qin-ctx left a comment

Uh oh!

qin-ctx Mar 17, 2026

Uh oh!

qin-ctx Mar 17, 2026

Uh oh!

mvanhorn commented Mar 17, 2026

Uh oh!

Uh oh!

mvanhorn commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvanhorn commented Mar 17, 2026

Summary

Changes

Test plan

Uh oh!

qin-ctx left a comment

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

mvanhorn commented Mar 17, 2026

Uh oh!

Uh oh!

mvanhorn commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants