fix(agents): recognize OpenAI/Codex server_error as retryable for model fallback#45761
fix(agents): recognize OpenAI/Codex server_error as retryable for model fallback#45761ToneLoke wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR extends
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 859-863
Comment:
**Missing test coverage for new code path**
The existing test file (`src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts`, line 847) has a dedicated test case for the Anthropic `api_error` pattern:
```ts
it("classifies JSON api_error internal server failures as timeout", () => {
expect(
classifyFailoverReason(
'{"type":"error","error":{"type":"api_error","message":"Internal server error"}}',
),
).toBe("timeout");
});
```
A parallel test for the new OpenAI/Codex `server_error` path should be added alongside it to ensure the new branch stays covered and to document the expected format:
```ts
it("classifies JSON server_error internal server failures as timeout", () => {
expect(
classifyFailoverReason(
'{"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}}',
),
).toBe("timeout");
});
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 861
Comment:
**Fragile string-matching may miss pretty-printed JSON**
The check uses `value.includes('"type":"server_error"')` (no space after the colon), which will silently fail to match if OpenAI ever sends a pretty-printed response like `"type": "server_error"`. The Anthropic branch has the same pre-existing fragility. Since you're touching this function, consider switching to a safe JSON parse + property access, or at minimum accepting an optional space:
```suggestion
if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) {
```
A more robust alternative would be:
```ts
try {
const parsed = JSON.parse(raw);
const errType = parsed?.error?.type;
const errCode = parsed?.error?.code;
if (errType === "server_error" && errCode === "server_error") {
return true;
}
} catch {
// not valid JSON, fall through
}
```
This is a pre-existing issue but worth addressing while the function is being modified.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 0b58993 |
| // OpenAI/Codex wraps transient 500s as: | ||
| // {"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}} | ||
| if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Missing test coverage for new code path
The existing test file (src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts, line 847) has a dedicated test case for the Anthropic api_error pattern:
it("classifies JSON api_error internal server failures as timeout", () => {
expect(
classifyFailoverReason(
'{"type":"error","error":{"type":"api_error","message":"Internal server error"}}',
),
).toBe("timeout");
});A parallel test for the new OpenAI/Codex server_error path should be added alongside it to ensure the new branch stays covered and to document the expected format:
it("classifies JSON server_error internal server failures as timeout", () => {
expect(
classifyFailoverReason(
'{"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}}',
),
).toBe("timeout");
});Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 859-863
Comment:
**Missing test coverage for new code path**
The existing test file (`src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts`, line 847) has a dedicated test case for the Anthropic `api_error` pattern:
```ts
it("classifies JSON api_error internal server failures as timeout", () => {
expect(
classifyFailoverReason(
'{"type":"error","error":{"type":"api_error","message":"Internal server error"}}',
),
).toBe("timeout");
});
```
A parallel test for the new OpenAI/Codex `server_error` path should be added alongside it to ensure the new branch stays covered and to document the expected format:
```ts
it("classifies JSON server_error internal server failures as timeout", () => {
expect(
classifyFailoverReason(
'{"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}}',
),
).toBe("timeout");
});
```
How can I resolve this? If you propose a fix, please make it concise.| } | ||
| // OpenAI/Codex wraps transient 500s as: | ||
| // {"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}} | ||
| if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) { |
There was a problem hiding this comment.
Fragile string-matching may miss pretty-printed JSON
The check uses value.includes('"type":"server_error"') (no space after the colon), which will silently fail to match if OpenAI ever sends a pretty-printed response like "type": "server_error". The Anthropic branch has the same pre-existing fragility. Since you're touching this function, consider switching to a safe JSON parse + property access, or at minimum accepting an optional space:
| if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) { | |
| if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) { |
A more robust alternative would be:
try {
const parsed = JSON.parse(raw);
const errType = parsed?.error?.type;
const errCode = parsed?.error?.code;
if (errType === "server_error" && errCode === "server_error") {
return true;
}
} catch {
// not valid JSON, fall through
}This is a pre-existing issue but worth addressing while the function is being modified.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-helpers/errors.ts
Line: 861
Comment:
**Fragile string-matching may miss pretty-printed JSON**
The check uses `value.includes('"type":"server_error"')` (no space after the colon), which will silently fail to match if OpenAI ever sends a pretty-printed response like `"type": "server_error"`. The Anthropic branch has the same pre-existing fragility. Since you're touching this function, consider switching to a safe JSON parse + property access, or at minimum accepting an optional space:
```suggestion
if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) {
```
A more robust alternative would be:
```ts
try {
const parsed = JSON.parse(raw);
const errType = parsed?.error?.type;
const errCode = parsed?.error?.code;
if (errType === "server_error" && errCode === "server_error") {
return true;
}
} catch {
// not valid JSON, fall through
}
```
This is a pre-existing issue but worth addressing while the function is being modified.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b589939f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| // OpenAI/Codex wraps transient 500s as: | ||
| // {"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}} | ||
| if (value.includes('"type":"server_error"') && value.includes('"code":"server_error"')) { |
There was a problem hiding this comment.
Match server_error without requiring code field
The new OpenAI/Codex branch only returns true when both "type":"server_error" and "code":"server_error" are present, but we already handle payloads in this repo where server_error appears without a code field (for example src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts uses {"error":{"type":"server_error"...}}). In that common shape, this detector still returns false, so transient provider failures remain non-retryable and model fallback can still hard-fail instead of retrying.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84b7d324a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (errType === "server_error" && errCode === "server_error") { | ||
| return true; |
There was a problem hiding this comment.
Match server_error even when code is missing
The new OpenAI/Codex branch only returns true when both error.type and error.code are exactly "server_error", but this repo already contains server_error payload shapes without a code field (for example src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts:73). In that common shape, classifyFailoverReason() still won’t classify the error as retryable (timeout), so transient provider failures can continue to bypass model fallback and hard-fail requests.
Useful? React with 👍 / 👎.
This comment was marked as spam.
This comment was marked as spam.
…el fallback
The transient HTTP error detector only recognized Anthropic's api_error
format. OpenAI and Codex use a different envelope:
{"type":"server_error","code":"server_error","message":"..."}
These transient 500s were treated as fatal, skipping model fallback
entirely. Now they're correctly identified as retryable.
…error - Add parallel test case for OpenAI/Codex server_error classification - Replace fragile .includes() string matching with JSON.parse + property access, falling back to substring check for non-JSON payloads - Handles pretty-printed JSON with spaces after colons
84b7d32 to
b7a7e55
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7a7e55167
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const parsed = JSON.parse(raw); | ||
| const errType = parsed?.error?.type; | ||
| const errCode = parsed?.error?.code; | ||
| if (errType === "server_error" && errCode === "server_error") { |
There was a problem hiding this comment.
Accept server_error even when code is absent
This check only classifies the payload as retryable when both error.type and error.code are exactly "server_error", but OpenAI/Codex error payloads commonly omit code, so the same transient 5xx still falls through without failover. Fresh evidence in this repo: src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts:79 uses a server_error payload without a code field, and parseApiErrorInfo already treats err.type as primary while err.code is optional (src/agents/pi-embedded-helpers/errors.ts:627-632).
Useful? React with 👍 / 👎.
|
Closing this as implemented after Codex review. Current What I checked:
So I’m closing this as already implemented rather than keeping a duplicate issue open. Review notes: reviewed against f03252aaf9cc; fix evidence: commit 37c37eecfbe4. |
Problem
The transient HTTP error detector (
isJsonApiInternalServerError) only recognized Anthropic'sapi_errorJSON format. OpenAI and Codex use a different envelope:{"type":"error","error":{"type":"server_error","code":"server_error","message":"An error occurred..."}}These transient 500s were treated as fatal errors, causing the model fallback chain to skip retrying — resulting in unnecessary failures.
Fix
Extend
isJsonApiInternalServerError()to also match the OpenAI/Codexserver_errorpattern, so these transient errors trigger proper fallback behavior.Impact
src/agents/pi-embedded-helpers/errors.ts