Skip to content

fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192#38182

Closed
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/venice-max-tokens-cap-38168
Closed

fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192#38182
Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Sid-Qin:fix/venice-max-tokens-cap-38168

Conversation

@Sid-Qin
Copy link
Copy Markdown
Contributor

@Sid-Qin Sid-Qin commented Mar 6, 2026

Summary

Reads maxCompletionTokens from the Venice /models API response and uses per-model values in the static catalog instead of blanket 8192, preventing HTTP 400 errors on models with lower limits.

Problem

The Venice model catalog hardcoded maxTokens: 8192 for all 26 models. Several models (e.g. llama-3.3-70b, mistral-31-24b) only allow max_completion_tokens up to 4096. Venice returns:

400 "max_completion_tokens (8192) exceeds the maximum allowed for model 'llama-3.3-70b' (4096)"

This breaks the default onboarding experience for all Venice.ai users.

Changes

File Change
src/agents/venice-models.ts Add maxCompletionTokens?: number to VeniceModelSpec; add resolveApiMaxCompletionTokens() helper; lower static catalog maxTokens to 4096 for Llama, Hermes, Mistral, Gemma, and Venice Uncensored models; use API-reported maxCompletionTokens in discovery for both catalog and non-catalog models; default to 4096 for unknown models
src/agents/venice-models.test.ts Add 3 tests for API discovery with maxCompletionTokens; add 4 tests for resolveApiMaxCompletionTokens edge cases

Test plan

  • npx vitest run src/agents/venice-models.test.ts — 10/10 passing
  • Manual: configure Venice provider with llama-3.3-70b, verify no 400 error
  • Manual: verify models with higher limits still work (e.g. qwen3-235b-a22b-instruct-2507)
  • Manual: verify API discovery overrides catalog defaults when maxCompletionTokens is present

Security Impact

  • Does this change handle user input? No
  • Does this change affect authentication/authorization? No
  • Does this change affect data storage or retrieval? No
  • Does this change affect network communication? No — only the max_completion_tokens value in the request body changes
  • Does this change introduce new dependencies? No

Human Verification

  1. Onboard with Venice API key, send a message using llama-3.3-70b → verify no 400 error
  2. Send a message using qwen3-235b-a22b-instruct-2507 → verify normal operation
  3. Restart gateway → verify API discovery picks up maxCompletionTokens from Venice API

Failure Recovery

Revert the commit to restore the previous hardcoded values. Users can also override maxTokens per model in their config.

Risks and Mitigations

  • Risk: Some catalog models may have the wrong maxTokens cap in the static fallback
    Mitigation: API discovery overrides static values when maxCompletionTokens is present. The static catalog uses a conservative 4096 default for models known to have lower limits. Users can always override via config.

…d of hardcoded 8192

Venice models like llama-3.3-70b and mistral-31-24b only support up to
4096 max_completion_tokens, but the catalog hardcoded 8192 for all models,
causing HTTP 400 errors.  Now reads maxCompletionTokens from the Venice
/models API response and uses a conservative default of 4096 for unknown
models.

Closes openclaw#38168

Made-with: Cursor
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Mar 6, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🔵 Low Unbounded maxTokens derived from unauthenticated Venice /models response (cost/DoS amplification)

1. 🔵 Unbounded maxTokens derived from unauthenticated Venice /models response (cost/DoS amplification)

Property Value
Severity Low
CWE CWE-400
Location src/agents/venice-models.ts:417-505

Description

discoverVeniceModels() trusts the public (unauthenticated) Venice /models response field model_spec.maxCompletionTokens and maps it into internal ModelDefinitionConfig.maxTokens without applying a sane upper bound.

Implications if maxTokens is used downstream as a default/limit for generation:

  • A malicious/compromised /models response (or network manipulation in less-trusted environments) can set maxCompletionTokens to an extremely large value.
  • This value is persisted into generated models.json and then used to construct runtime models, potentially enabling very large generations (unexpected billing) and/or resource exhaustion (retries, large responses, long-running streams).

Vulnerable code:

function resolveApiMaxCompletionTokens(apiModel: VeniceModel): number | undefined {
  const raw = apiModel.model_spec.maxCompletionTokens;
  if (typeof raw === "number" && Number.isFinite(raw) && raw > 0) {
    return raw;
  }
  return undefined;
}
...
const apiMaxTokens = resolveApiMaxCompletionTokens(apiModel);
...
if (apiMaxTokens !== undefined) {
  def.maxTokens = apiMaxTokens;
}
...
maxTokens: apiMaxTokens ?? VENICE_DEFAULT_MAX_TOKENS,

Note: elsewhere in the codebase, some tools clamp their own maxTokens request options, but the core model object (used for agent runs) is still populated from discovery results and can flow into provider calls via the underlying SDK.

Recommendation

Clamp and normalize the API-provided value before using it.

Guidelines:

  • Require an integer (floor/round) and reject unreasonable values.
  • Cap by contextWindow (and/or Venice-reported availableContextTokens).
  • Apply a global/provider-specific hard maximum (e.g. 8192/16384), unless you explicitly support larger outputs.

Example fix:

const VENICE_HARD_MAX_COMPLETION_TOKENS = 8192; // or provider-specific

function resolveApiMaxCompletionTokens(apiModel: VeniceModel): number | undefined {
  const raw = apiModel.model_spec.maxCompletionTokens;
  if (typeof raw !== "number" || !Number.isFinite(raw) || raw <= 0) {
    return undefined;
  }

  const available = apiModel.model_spec.availableContextTokens;
  const capByContext =
    typeof available === "number" && Number.isFinite(available) && available > 0
      ? available
      : VENICE_HARD_MAX_COMPLETION_TOKENS;

  const normalized = Math.floor(raw);
  return Math.min(normalized, capByContext, VENICE_HARD_MAX_COMPLETION_TOKENS);
}

Additionally consider:

  • For catalog models, clamp by the catalog entry’s known maxTokens (or a safe maximum) rather than letting the network response override upward.
  • If Venice discovery is purely informational, treat the response as untrusted and do not use it to raise limits.

Analyzed PR: #38182 at commit a7cb9c5

Last updated on: 2026-03-06T17:19:28Z

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 6, 2026

Greptile Summary

This PR fixes a real production-breaking bug where all Venice models used a hardcoded maxTokens: 8192, causing HTTP 400 errors on models that only allow up to 4096 completion tokens. The fix is well-scoped: it reads maxCompletionTokens from the Venice /models API response, applies it via a clean validation helper, lowers the static catalog defaults for the affected models, and adds a safe fallback for unknown models.

  • Core logic in venice-models.ts is correct — the resolveApiMaxCompletionTokens guard properly rejects invalid values (NaN, Infinity, ≤0) and the discovery loop correctly prefers API values over catalog values when present, and catalog values over the global default when the API omits the field.
  • Key test gap: the "uses API maxCompletionTokens for catalog models when present" test uses maxCompletionTokens: 4096, which matches the catalog's existing 4096 for llama-3.3-70b. The test passes vacuously and would not catch a regression that removed the override entirely. A distinct value (e.g. 2048) should be used.
  • Missing coverage: no test asserts that a catalog model with maxTokens: 8192 (e.g. qwen3-235b-a22b-instruct-2507) retains its catalog value when the API omits maxCompletionTokens, preventing a future regression that silently caps it to VENICE_DEFAULT_MAX_TOKENS (4096).

Confidence Score: 4/5

  • Safe to merge — the production fix is correct; two test quality issues should be addressed but do not affect runtime behavior.
  • The implementation logic is sound and correctly resolves the 400 error for affected models. The one point deduction is for the vacuous catalog-override test, which provides false confidence that the API override path works when it actually tests a no-op (same value in API and catalog). This doesn't represent a runtime bug but does mean a regression in that code path could go undetected.
  • src/agents/venice-models.test.ts — two test cases need attention as described in the review comments.

Last reviewed commit: a7cb9c5

Comment on lines +103 to +113
it("uses API maxCompletionTokens for catalog models when present", async () => {
const fetchMock = vi.fn(async () =>
makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 4096 }),
);
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const llama = models.find((m) => m.id === "llama-3.3-70b");
expect(llama).toBeDefined();
expect(llama!.maxTokens).toBe(4096);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test provides false confidence — API value equals catalog value

The API mock uses maxCompletionTokens: 4096, which is identical to llama-3.3-70b's catalog entry maxTokens: 4096. Because both values are the same, this test would pass even if the API override branch (def.maxTokens = apiMaxTokens) was completely removed — the catalog default already satisfies expect(llama!.maxTokens).toBe(4096). The test doesn't actually verify that the API value takes precedence over the catalog.

To make this test meaningful, use a value that is distinct from the catalog entry:

Suggested change
it("uses API maxCompletionTokens for catalog models when present", async () => {
const fetchMock = vi.fn(async () =>
makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 4096 }),
);
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const llama = models.find((m) => m.id === "llama-3.3-70b");
expect(llama).toBeDefined();
expect(llama!.maxTokens).toBe(4096);
});
it("uses API maxCompletionTokens for catalog models when present", async () => {
const fetchMock = vi.fn(async () =>
makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 2048 }),
);
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const llama = models.find((m) => m.id === "llama-3.3-70b");
expect(llama).toBeDefined();
expect(llama!.maxTokens).toBe(2048);
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/venice-models.test.ts
Line: 103-113

Comment:
**Test provides false confidence — API value equals catalog value**

The API mock uses `maxCompletionTokens: 4096`, which is identical to `llama-3.3-70b`'s catalog entry `maxTokens: 4096`. Because both values are the same, this test would pass even if the API override branch (`def.maxTokens = apiMaxTokens`) was completely removed — the catalog default already satisfies `expect(llama!.maxTokens).toBe(4096)`. The test doesn't actually verify that the API value takes precedence over the catalog.

To make this test meaningful, use a value that is distinct from the catalog entry:

```suggestion
  it("uses API maxCompletionTokens for catalog models when present", async () => {
    const fetchMock = vi.fn(async () =>
      makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 2048 }),
    );
    vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

    const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
    const llama = models.find((m) => m.id === "llama-3.3-70b");
    expect(llama).toBeDefined();
    expect(llama!.maxTokens).toBe(2048);
  });
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +127 to +135
it("falls back to conservative default when API omits maxCompletionTokens for non-catalog models", async () => {
const fetchMock = vi.fn(async () => makeModelsResponse("unknown-model"));
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const unknownModel = models.find((m) => m.id === "unknown-model");
expect(unknownModel).toBeDefined();
expect(unknownModel!.maxTokens).toBe(VENICE_DEFAULT_MAX_TOKENS);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test: catalog model without API maxCompletionTokens should preserve catalog value

The existing fallback test only covers non-catalog models (using "unknown-model"). For catalog models, the fallback behavior is different — the catalog's maxTokens should be used instead of VENICE_DEFAULT_MAX_TOKENS. Several catalog models have maxTokens: 8192, which differs from VENICE_DEFAULT_MAX_TOKENS (4096), so a regression here (e.g., always applying the default instead of the catalog value) could silently cap those models.

Consider adding a complementary test alongside this one:

it("preserves catalog maxTokens for catalog models when API omits maxCompletionTokens", async () => {
  // qwen3-235b-a22b-instruct-2507 has maxTokens: 8192 in the catalog
  const fetchMock = vi.fn(async () => makeModelsResponse("qwen3-235b-a22b-instruct-2507"));
  vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

  const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
  const qwen = models.find((m) => m.id === "qwen3-235b-a22b-instruct-2507");
  expect(qwen).toBeDefined();
  expect(qwen!.maxTokens).toBe(8192); // catalog value, NOT VENICE_DEFAULT_MAX_TOKENS (4096)
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/venice-models.test.ts
Line: 127-135

Comment:
**Missing test: catalog model without API `maxCompletionTokens` should preserve catalog value**

The existing fallback test only covers **non-catalog** models (using `"unknown-model"`). For **catalog** models, the fallback behavior is different — the catalog's `maxTokens` should be used instead of `VENICE_DEFAULT_MAX_TOKENS`. Several catalog models have `maxTokens: 8192`, which differs from `VENICE_DEFAULT_MAX_TOKENS` (4096), so a regression here (e.g., always applying the default instead of the catalog value) could silently cap those models.

Consider adding a complementary test alongside this one:

```typescript
it("preserves catalog maxTokens for catalog models when API omits maxCompletionTokens", async () => {
  // qwen3-235b-a22b-instruct-2507 has maxTokens: 8192 in the catalog
  const fetchMock = vi.fn(async () => makeModelsResponse("qwen3-235b-a22b-instruct-2507"));
  vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);

  const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
  const qwen = models.find((m) => m.id === "qwen3-235b-a22b-instruct-2507");
  expect(qwen).toBeDefined();
  expect(qwen!.maxTokens).toBe(8192); // catalog value, NOT VENICE_DEFAULT_MAX_TOKENS (4096)
});
```

How can I resolve this? If you propose a fix, please make it concise.

@vincentkoc
Copy link
Copy Markdown
Member

Nice work surfacing the safer per-model override path here.

I'm consolidating this Venice bug cluster under #38168 before we merge anything, so I'm keeping the issue as the single canonical thread and marking overlapping PR attempts as duplicates for hygiene. This branch is one of the strongest partial fixes in the set, and I expect to reuse the good parts rather than let a handful of similar PRs compete.

Your contribution is preserved in that credit trail. If you think this branch covers something outside #38168, point me at it and I'll re-check the split.

@vincentkoc vincentkoc added dedupe:child Duplicate issue/PR child in dedupe cluster close:duplicate Closed as duplicate labels Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling close:duplicate Closed as duplicate dedupe:child Duplicate issue/PR child in dedupe cluster size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants