fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192#38182
fix(agents): use per-model maxCompletionTokens from Venice API instead of hardcoded 8192#38182Sid-Qin wants to merge 1 commit intoopenclaw:mainfrom
Conversation
…d of hardcoded 8192 Venice models like llama-3.3-70b and mistral-31-24b only support up to 4096 max_completion_tokens, but the catalog hardcoded 8192 for all models, causing HTTP 400 errors. Now reads maxCompletionTokens from the Venice /models API response and uses a conservative default of 4096 for unknown models. Closes openclaw#38168 Made-with: Cursor
🔒 Aisle Security AnalysisWe found 1 potential security issue(s) in this PR:
1. 🔵 Unbounded maxTokens derived from unauthenticated Venice /models response (cost/DoS amplification)
Description
Implications if
Vulnerable code: function resolveApiMaxCompletionTokens(apiModel: VeniceModel): number | undefined {
const raw = apiModel.model_spec.maxCompletionTokens;
if (typeof raw === "number" && Number.isFinite(raw) && raw > 0) {
return raw;
}
return undefined;
}
...
const apiMaxTokens = resolveApiMaxCompletionTokens(apiModel);
...
if (apiMaxTokens !== undefined) {
def.maxTokens = apiMaxTokens;
}
...
maxTokens: apiMaxTokens ?? VENICE_DEFAULT_MAX_TOKENS,Note: elsewhere in the codebase, some tools clamp their own RecommendationClamp and normalize the API-provided value before using it. Guidelines:
Example fix: const VENICE_HARD_MAX_COMPLETION_TOKENS = 8192; // or provider-specific
function resolveApiMaxCompletionTokens(apiModel: VeniceModel): number | undefined {
const raw = apiModel.model_spec.maxCompletionTokens;
if (typeof raw !== "number" || !Number.isFinite(raw) || raw <= 0) {
return undefined;
}
const available = apiModel.model_spec.availableContextTokens;
const capByContext =
typeof available === "number" && Number.isFinite(available) && available > 0
? available
: VENICE_HARD_MAX_COMPLETION_TOKENS;
const normalized = Math.floor(raw);
return Math.min(normalized, capByContext, VENICE_HARD_MAX_COMPLETION_TOKENS);
}Additionally consider:
Analyzed PR: #38182 at commit Last updated on: 2026-03-06T17:19:28Z |
Greptile SummaryThis PR fixes a real production-breaking bug where all Venice models used a hardcoded
Confidence Score: 4/5
Last reviewed commit: a7cb9c5 |
| it("uses API maxCompletionTokens for catalog models when present", async () => { | ||
| const fetchMock = vi.fn(async () => | ||
| makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 4096 }), | ||
| ); | ||
| vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); | ||
|
|
||
| const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); | ||
| const llama = models.find((m) => m.id === "llama-3.3-70b"); | ||
| expect(llama).toBeDefined(); | ||
| expect(llama!.maxTokens).toBe(4096); | ||
| }); |
There was a problem hiding this comment.
Test provides false confidence — API value equals catalog value
The API mock uses maxCompletionTokens: 4096, which is identical to llama-3.3-70b's catalog entry maxTokens: 4096. Because both values are the same, this test would pass even if the API override branch (def.maxTokens = apiMaxTokens) was completely removed — the catalog default already satisfies expect(llama!.maxTokens).toBe(4096). The test doesn't actually verify that the API value takes precedence over the catalog.
To make this test meaningful, use a value that is distinct from the catalog entry:
| it("uses API maxCompletionTokens for catalog models when present", async () => { | |
| const fetchMock = vi.fn(async () => | |
| makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 4096 }), | |
| ); | |
| vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); | |
| const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); | |
| const llama = models.find((m) => m.id === "llama-3.3-70b"); | |
| expect(llama).toBeDefined(); | |
| expect(llama!.maxTokens).toBe(4096); | |
| }); | |
| it("uses API maxCompletionTokens for catalog models when present", async () => { | |
| const fetchMock = vi.fn(async () => | |
| makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 2048 }), | |
| ); | |
| vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); | |
| const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); | |
| const llama = models.find((m) => m.id === "llama-3.3-70b"); | |
| expect(llama).toBeDefined(); | |
| expect(llama!.maxTokens).toBe(2048); | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/venice-models.test.ts
Line: 103-113
Comment:
**Test provides false confidence — API value equals catalog value**
The API mock uses `maxCompletionTokens: 4096`, which is identical to `llama-3.3-70b`'s catalog entry `maxTokens: 4096`. Because both values are the same, this test would pass even if the API override branch (`def.maxTokens = apiMaxTokens`) was completely removed — the catalog default already satisfies `expect(llama!.maxTokens).toBe(4096)`. The test doesn't actually verify that the API value takes precedence over the catalog.
To make this test meaningful, use a value that is distinct from the catalog entry:
```suggestion
it("uses API maxCompletionTokens for catalog models when present", async () => {
const fetchMock = vi.fn(async () =>
makeModelsResponse("llama-3.3-70b", { maxCompletionTokens: 2048 }),
);
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const llama = models.find((m) => m.id === "llama-3.3-70b");
expect(llama).toBeDefined();
expect(llama!.maxTokens).toBe(2048);
});
```
How can I resolve this? If you propose a fix, please make it concise.| it("falls back to conservative default when API omits maxCompletionTokens for non-catalog models", async () => { | ||
| const fetchMock = vi.fn(async () => makeModelsResponse("unknown-model")); | ||
| vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch); | ||
|
|
||
| const models = await runWithDiscoveryEnabled(() => discoverVeniceModels()); | ||
| const unknownModel = models.find((m) => m.id === "unknown-model"); | ||
| expect(unknownModel).toBeDefined(); | ||
| expect(unknownModel!.maxTokens).toBe(VENICE_DEFAULT_MAX_TOKENS); | ||
| }); |
There was a problem hiding this comment.
Missing test: catalog model without API maxCompletionTokens should preserve catalog value
The existing fallback test only covers non-catalog models (using "unknown-model"). For catalog models, the fallback behavior is different — the catalog's maxTokens should be used instead of VENICE_DEFAULT_MAX_TOKENS. Several catalog models have maxTokens: 8192, which differs from VENICE_DEFAULT_MAX_TOKENS (4096), so a regression here (e.g., always applying the default instead of the catalog value) could silently cap those models.
Consider adding a complementary test alongside this one:
it("preserves catalog maxTokens for catalog models when API omits maxCompletionTokens", async () => {
// qwen3-235b-a22b-instruct-2507 has maxTokens: 8192 in the catalog
const fetchMock = vi.fn(async () => makeModelsResponse("qwen3-235b-a22b-instruct-2507"));
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const qwen = models.find((m) => m.id === "qwen3-235b-a22b-instruct-2507");
expect(qwen).toBeDefined();
expect(qwen!.maxTokens).toBe(8192); // catalog value, NOT VENICE_DEFAULT_MAX_TOKENS (4096)
});Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/venice-models.test.ts
Line: 127-135
Comment:
**Missing test: catalog model without API `maxCompletionTokens` should preserve catalog value**
The existing fallback test only covers **non-catalog** models (using `"unknown-model"`). For **catalog** models, the fallback behavior is different — the catalog's `maxTokens` should be used instead of `VENICE_DEFAULT_MAX_TOKENS`. Several catalog models have `maxTokens: 8192`, which differs from `VENICE_DEFAULT_MAX_TOKENS` (4096), so a regression here (e.g., always applying the default instead of the catalog value) could silently cap those models.
Consider adding a complementary test alongside this one:
```typescript
it("preserves catalog maxTokens for catalog models when API omits maxCompletionTokens", async () => {
// qwen3-235b-a22b-instruct-2507 has maxTokens: 8192 in the catalog
const fetchMock = vi.fn(async () => makeModelsResponse("qwen3-235b-a22b-instruct-2507"));
vi.stubGlobal("fetch", fetchMock as unknown as typeof fetch);
const models = await runWithDiscoveryEnabled(() => discoverVeniceModels());
const qwen = models.find((m) => m.id === "qwen3-235b-a22b-instruct-2507");
expect(qwen).toBeDefined();
expect(qwen!.maxTokens).toBe(8192); // catalog value, NOT VENICE_DEFAULT_MAX_TOKENS (4096)
});
```
How can I resolve this? If you propose a fix, please make it concise.|
Nice work surfacing the safer per-model override path here. I'm consolidating this Venice bug cluster under #38168 before we merge anything, so I'm keeping the issue as the single canonical thread and marking overlapping PR attempts as duplicates for hygiene. This branch is one of the strongest partial fixes in the set, and I expect to reuse the good parts rather than let a handful of similar PRs compete. Your contribution is preserved in that credit trail. If you think this branch covers something outside #38168, point me at it and I'll re-check the split. |
Summary
Reads
maxCompletionTokensfrom the Venice/modelsAPI response and uses per-model values in the static catalog instead of blanket8192, preventing HTTP 400 errors on models with lower limits.Problem
The Venice model catalog hardcoded
maxTokens: 8192for all 26 models. Several models (e.g.llama-3.3-70b,mistral-31-24b) only allowmax_completion_tokensup to 4096. Venice returns:This breaks the default onboarding experience for all Venice.ai users.
Changes
src/agents/venice-models.tsmaxCompletionTokens?: numbertoVeniceModelSpec; addresolveApiMaxCompletionTokens()helper; lower static catalogmaxTokensto 4096 for Llama, Hermes, Mistral, Gemma, and Venice Uncensored models; use API-reportedmaxCompletionTokensin discovery for both catalog and non-catalog models; default to 4096 for unknown modelssrc/agents/venice-models.test.tsmaxCompletionTokens; add 4 tests forresolveApiMaxCompletionTokensedge casesTest plan
npx vitest run src/agents/venice-models.test.ts— 10/10 passingllama-3.3-70b, verify no 400 errorqwen3-235b-a22b-instruct-2507)maxCompletionTokensis presentSecurity Impact
max_completion_tokensvalue in the request body changesHuman Verification
llama-3.3-70b→ verify no 400 errorqwen3-235b-a22b-instruct-2507→ verify normal operationmaxCompletionTokensfrom Venice APIFailure Recovery
Revert the commit to restore the previous hardcoded values. Users can also override
maxTokensper model in their config.Risks and Mitigations
maxTokenscap in the static fallbackMitigation: API discovery overrides static values when
maxCompletionTokensis present. The static catalog uses a conservative 4096 default for models known to have lower limits. Users can always override via config.