fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192 by Sid-Qin · Pull Request #38182 · openclaw/openclaw

Sid-Qin · 2026-03-06T17:05:56Z

Summary

Reads maxCompletionTokens from the Venice /models API response and uses per-model values in the static catalog instead of blanket 8192, preventing HTTP 400 errors on models with lower limits.

Problem

The Venice model catalog hardcoded maxTokens: 8192 for all 26 models. Several models (e.g. llama-3.3-70b, mistral-31-24b) only allow max_completion_tokens up to 4096. Venice returns:

400 "max_completion_tokens (8192) exceeds the maximum allowed for model 'llama-3.3-70b' (4096)"

This breaks the default onboarding experience for all Venice.ai users.

Changes

File	Change
`src/agents/venice-models.ts`	Add `maxCompletionTokens?: number` to `VeniceModelSpec`; add `resolveApiMaxCompletionTokens()` helper; lower static catalog `maxTokens` to 4096 for Llama, Hermes, Mistral, Gemma, and Venice Uncensored models; use API-reported `maxCompletionTokens` in discovery for both catalog and non-catalog models; default to 4096 for unknown models
`src/agents/venice-models.test.ts`	Add 3 tests for API discovery with `maxCompletionTokens`; add 4 tests for `resolveApiMaxCompletionTokens` edge cases

Test plan

npx vitest run src/agents/venice-models.test.ts — 10/10 passing
Manual: configure Venice provider with llama-3.3-70b, verify no 400 error
Manual: verify models with higher limits still work (e.g. qwen3-235b-a22b-instruct-2507)
Manual: verify API discovery overrides catalog defaults when maxCompletionTokens is present

Security Impact

Does this change handle user input? No
Does this change affect authentication/authorization? No
Does this change affect data storage or retrieval? No
Does this change affect network communication? No — only the max_completion_tokens value in the request body changes
Does this change introduce new dependencies? No

Human Verification

Onboard with Venice API key, send a message using llama-3.3-70b → verify no 400 error
Send a message using qwen3-235b-a22b-instruct-2507 → verify normal operation
Restart gateway → verify API discovery picks up maxCompletionTokens from Venice API

Failure Recovery

Revert the commit to restore the previous hardcoded values. Users can also override maxTokens per model in their config.

Risks and Mitigations

Risk: Some catalog models may have the wrong maxTokens cap in the static fallback
Mitigation: API discovery overrides static values when maxCompletionTokens is present. The static catalog uses a conservative 4096 default for models known to have lower limits. Users can always override via config.

…d of hardcoded 8192 Venice models like llama-3.3-70b and mistral-31-24b only support up to 4096 max_completion_tokens, but the catalog hardcoded 8192 for all models, causing HTTP 400 errors. Now reads maxCompletionTokens from the Venice /models API response and uses a conservative default of 4096 for unknown models. Closes openclaw#38168 Made-with: Cursor

aisle-research-bot · 2026-03-06T17:06:04Z

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

#	Severity	Title
1	🔵 Low	Unbounded maxTokens derived from unauthenticated Venice /models response (cost/DoS amplification)

1. 🔵 Unbounded maxTokens derived from unauthenticated Venice /models response (cost/DoS amplification)

Property	Value
Severity	Low
CWE	CWE-400
Location	`src/agents/venice-models.ts:417-505`

Description

discoverVeniceModels() trusts the public (unauthenticated) Venice /models response field model_spec.maxCompletionTokens and maps it into internal ModelDefinitionConfig.maxTokens without applying a sane upper bound.

Implications if maxTokens is used downstream as a default/limit for generation:

A malicious/compromised /models response (or network manipulation in less-trusted environments) can set maxCompletionTokens to an extremely large value.
This value is persisted into generated models.json and then used to construct runtime models, potentially enabling very large generations (unexpected billing) and/or resource exhaustion (retries, large responses, long-running streams).

Vulnerable code:

function resolveApiMaxCompletionTokens(apiModel: VeniceModel): number | undefined {
  const raw = apiModel.model_spec.maxCompletionTokens;
  if (typeof raw === "number" && Number.isFinite(raw) && raw > 0) {
    return raw;
  }
  return undefined;
}
...
const apiMaxTokens = resolveApiMaxCompletionTokens(apiModel);
...
if (apiMaxTokens !== undefined) {
  def.maxTokens = apiMaxTokens;
}
...
maxTokens: apiMaxTokens ?? VENICE_DEFAULT_MAX_TOKENS,

Note: elsewhere in the codebase, some tools clamp their own maxTokens request options, but the core model object (used for agent runs) is still populated from discovery results and can flow into provider calls via the underlying SDK.

Recommendation

Clamp and normalize the API-provided value before using it.

Guidelines:

Require an integer (floor/round) and reject unreasonable values.
Cap by contextWindow (and/or Venice-reported availableContextTokens).
Apply a global/provider-specific hard maximum (e.g. 8192/16384), unless you explicitly support larger outputs.

Example fix:

const VENICE_HARD_MAX_COMPLETION_TOKENS = 8192; // or provider-specific

function resolveApiMaxCompletionTokens(apiModel: VeniceModel): number | undefined {
  const raw = apiModel.model_spec.maxCompletionTokens;
  if (typeof raw !== "number" || !Number.isFinite(raw) || raw <= 0) {
    return undefined;
  }

  const available = apiModel.model_spec.availableContextTokens;
  const capByContext =
    typeof available === "number" && Number.isFinite(available) && available > 0
      ? available
      : VENICE_HARD_MAX_COMPLETION_TOKENS;

  const normalized = Math.floor(raw);
  return Math.min(normalized, capByContext, VENICE_HARD_MAX_COMPLETION_TOKENS);
}

Additionally consider:

For catalog models, clamp by the catalog entry’s known maxTokens (or a safe maximum) rather than letting the network response override upward.
If Venice discovery is purely informational, treat the response as untrusted and do not use it to raise limits.

Analyzed PR: #38182 at commit a7cb9c5

_{Last updated on: 2026-03-06T17:19:28Z}

greptile-apps · 2026-03-06T17:09:55Z

Greptile Summary

This PR fixes a real production-breaking bug where all Venice models used a hardcoded maxTokens: 8192, causing HTTP 400 errors on models that only allow up to 4096 completion tokens. The fix is well-scoped: it reads maxCompletionTokens from the Venice /models API response, applies it via a clean validation helper, lowers the static catalog defaults for the affected models, and adds a safe fallback for unknown models.

Core logic in venice-models.ts is correct — the resolveApiMaxCompletionTokens guard properly rejects invalid values (NaN, Infinity, ≤0) and the discovery loop correctly prefers API values over catalog values when present, and catalog values over the global default when the API omits the field.
Key test gap: the "uses API maxCompletionTokens for catalog models when present" test uses maxCompletionTokens: 4096, which matches the catalog's existing 4096 for llama-3.3-70b. The test passes vacuously and would not catch a regression that removed the override entirely. A distinct value (e.g. 2048) should be used.
Missing coverage: no test asserts that a catalog model with maxTokens: 8192 (e.g. qwen3-235b-a22b-instruct-2507) retains its catalog value when the API omits maxCompletionTokens, preventing a future regression that silently caps it to VENICE_DEFAULT_MAX_TOKENS (4096).

Confidence Score: 4/5

Safe to merge — the production fix is correct; two test quality issues should be addressed but do not affect runtime behavior.
The implementation logic is sound and correctly resolves the 400 error for affected models. The one point deduction is for the vacuous catalog-override test, which provides false confidence that the API override path works when it actually tests a no-op (same value in API and catalog). This doesn't represent a runtime bug but does mean a regression in that code path could go undetected.
src/agents/venice-models.test.ts — two test cases need attention as described in the review comments.

_{Last reviewed commit: a7cb9c5}

greptile-apps · 2026-03-06T17:09:58Z

+  it("uses API maxCompletionTokens for catalog models when present", async () => {
+    const fetchMock = vi.fn(async () =>
+      makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 4096 }),
+    );
+    vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
+
+    const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
+    const llama = models.find((m) => m.id === "llama-3.3-70b");
+    expect(llama).toBeDefined();
+    expect(llama!.maxTokens).toBe(4096);
+  });


Test provides false confidence — API value equals catalog value

The API mock uses maxCompletionTokens: 4096, which is identical to llama-3.3-70b's catalog entry maxTokens: 4096. Because both values are the same, this test would pass even if the API override branch (def.maxTokens = apiMaxTokens) was completely removed — the catalog default already satisfies expect(llama!.maxTokens).toBe(4096). The test doesn't actually verify that the API value takes precedence over the catalog.

To make this test meaningful, use a value that is distinct from the catalog entry:

Suggested change

it("uses API maxCompletionTokens for catalog models when present", async () => {

const fetchMock = vi.fn(async () =>

makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 4096 }),

);

vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());

const llama = models.find((m) => m.id === "llama-3.3-70b");

expect(llama).toBeDefined();

expect(llama!.maxTokens).toBe(4096);

});

it("uses API maxCompletionTokens for catalog models when present", async () => {

const fetchMock = vi.fn(async () =>

makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 2048 }),

);

vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());

const llama = models.find((m) => m.id === "llama-3.3-70b");

expect(llama).toBeDefined();

expect(llama!.maxTokens).toBe(2048);

});

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/venice-models.test.ts Line: 103-113 Comment: **Test provides false confidence — API value equals catalog value** The API mock uses `maxCompletionTokens: 4096`, which is identical to `llama-3.3-70b`'s catalog entry `maxTokens: 4096`. Because both values are the same, this test would pass even if the API override branch (`def.maxTokens = apiMaxTokens`) was completely removed — the catalog default already satisfies `expect(llama!.maxTokens).toBe(4096)`. The test doesn't actually verify that the API value takes precedence over the catalog. To make this test meaningful, use a value that is distinct from the catalog entry: ```suggestion it("uses API maxCompletionTokens for catalog models when present", async () => { const fetchMock = vi.fn(async () => makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 2048 }), ); vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); const llama = models.find((m) => m.id === "llama-3.3-70b"); expect(llama).toBeDefined(); expect(llama!.maxTokens).toBe(2048); }); ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-06T17:09:59Z

+  it("falls back to conservative default when API omits maxCompletionTokens for non-catalog models", async () => {
+    const fetchMock = vi.fn(async () => makeModelsResponse("unknown-model"));
+    vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
+
+    const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
+    const unknownModel = models.find((m) => m.id === "unknown-model");
+    expect(unknownModel).toBeDefined();
+    expect(unknownModel!.maxTokens).toBe(VENICE_DEFAULT_MAX_TOKENS);
+  });


Missing test: catalog model without API maxCompletionTokens should preserve catalog value

The existing fallback test only covers non-catalog models (using "unknown-model"). For catalog models, the fallback behavior is different — the catalog's maxTokens should be used instead of VENICE_DEFAULT_MAX_TOKENS. Several catalog models have maxTokens: 8192, which differs from VENICE_DEFAULT_MAX_TOKENS (4096), so a regression here (e.g., always applying the default instead of the catalog value) could silently cap those models.

Consider adding a complementary test alongside this one:

it("preserves catalog maxTokens for catalog models when API omits maxCompletionTokens", async () => { // qwen3-235b-a22b-instruct-2507 has maxTokens: 8192 in the catalog const fetchMock = vi.fn(async () => makeModelsResponse("qwen3-235b-a22b-instruct-2507")); vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); const qwen = models.find((m) => m.id === "qwen3-235b-a22b-instruct-2507"); expect(qwen).toBeDefined(); expect(qwen!.maxTokens).toBe(8192); // catalog value, NOT VENICE_DEFAULT_MAX_TOKENS (4096) });

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/venice-models.test.ts Line: 127-135 Comment: **Missing test: catalog model without API `maxCompletionTokens` should preserve catalog value** The existing fallback test only covers **non-catalog** models (using `"unknown-model"`). For **catalog** models, the fallback behavior is different — the catalog's `maxTokens` should be used instead of `VENICE_DEFAULT_MAX_TOKENS`. Several catalog models have `maxTokens: 8192`, which differs from `VENICE_DEFAULT_MAX_TOKENS` (4096), so a regression here (e.g., always applying the default instead of the catalog value) could silently cap those models. Consider adding a complementary test alongside this one: ```typescript it("preserves catalog maxTokens for catalog models when API omits maxCompletionTokens", async () => { // qwen3-235b-a22b-instruct-2507 has maxTokens: 8192 in the catalog const fetchMock = vi.fn(async () => makeModelsResponse("qwen3-235b-a22b-instruct-2507")); vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); const qwen = models.find((m) => m.id === "qwen3-235b-a22b-instruct-2507"); expect(qwen).toBeDefined(); expect(qwen!.maxTokens).toBe(8192); // catalog value, NOT VENICE_DEFAULT_MAX_TOKENS (4096) }); ``` How can I resolve this? If you propose a fix, please make it concise.

vincentkoc · 2026-03-06T19:35:35Z

Nice work surfacing the safer per-model override path here.

I'm consolidating this Venice bug cluster under #38168 before we merge anything, so I'm keeping the issue as the single canonical thread and marking overlapping PR attempts as duplicates for hygiene. This branch is one of the strongest partial fixes in the set, and I expect to reuse the good parts rather than let a handful of similar PRs compete.

Your contribution is preserved in that credit trail. If you think this branch covers something outside #38168, point me at it and I'll re-check the split.

openclaw-barnacle Bot added agents Agent runtime and tooling size: S experienced-contributor labels Mar 6, 2026

greptile-apps Bot reviewed Mar 6, 2026

View reviewed changes

thewilloftheshadow closed this Mar 6, 2026

vincentkoc added dedupe:child Duplicate issue/PR child in dedupe cluster close:duplicate Closed as duplicate labels Mar 6, 2026

vincentkoc mentioned this pull request Mar 6, 2026

fix(venice): harden discovery limits and tool support #38306

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192#38182

fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192#38182
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/venice-max-tokens-cap-38168

Sid-Qin commented Mar 6, 2026 •

edited

Loading

Uh oh!

aisle-research-bot Bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Mar 6, 2026

Uh oh!

greptile-apps Bot Mar 6, 2026

Uh oh!

greptile-apps Bot Mar 6, 2026

Uh oh!

vincentkoc commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Sid-Qin commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Changes

Test plan

Security Impact

Human Verification

Failure Recovery

Risks and Mitigations

Uh oh!

aisle-research-bot Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

1. 🔵 Unbounded maxTokens derived from unauthenticated Venice /models response (cost/DoS amplification)

Description

Recommendation

Uh oh!

greptile-apps Bot commented Mar 6, 2026

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

vincentkoc commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sid-Qin commented Mar 6, 2026 •

edited

Loading

aisle-research-bot Bot commented Mar 6, 2026 •

edited

Loading