Responses API: misleading error on context overflow, must communicate token limit exceeded

## Bug Description

When the conversation context hits the `ctx-size` limit during a `/v1/responses` tool-calling session, the API returns a misleading 500 error that provides no indication of the actual cause (context overflow). This creates unrecoverable retry loops in agentic clients like Codex CLI.

## Reproduction

1. Use Codex CLI (or any agentic client) against `/v1/responses` with a model like `qwen3.5-27b` at `ctx-size=32768`
2. Let the conversation grow through tool call rounds until context approaches 32K tokens
3. Observe the failure cascade

## What happens

### Phase 1: Silent truncation
When context reaches the limit, the model generates a truncated tool call (e.g., only 35 tokens before hitting the ceiling). The server returns this as **HTTP 200** with:
- `"status": "completed"` (should be `"incomplete"`)  
- Truncated, invalid JSON in the tool call arguments (e.g., `{"command": "cat s` — cut off at column 20)
- `finish_reason: "length"` is set but no `incomplete_details` object

### Phase 2: Misleading error on next request
The client includes the truncated tool call in conversation history for the next request. The `func_args_not_string()` function in `common/chat.cpp:1407` tries to parse it and throws:

```
Failed to parse tool call arguments as JSON: [json.exception.parse_error.101] 
parse error at line 1, column 20: syntax error while parsing object - 
unexpected end of input; expected '}'
```

This is returned as a **generic 500** with no indication that context overflow was the root cause. The client retries endlessly since the conversation only grows.

## Expected behavior

Two things need to happen:

### 1. Truncated responses must signal `"status": "incomplete"`

Both `to_json_oaicompat_resp()` (line ~973) and `to_json_oaicompat_resp_stream()` (line ~1080) currently hardcode `"status": "completed"`. When `stop == STOP_TYPE_LIMIT`, the response must:
- Set `"status": "incomplete"` 
- Include `"incomplete_details": {"reason": "max_output_tokens"}` (per OpenAI spec)
- Either omit the truncated tool call entirely, or mark it as incomplete so the client doesn't try to use it

### 2. Input validation must return a clear, actionable error

When `func_args_not_string()` fails to parse tool call arguments from input messages, the error should check whether the token count is near the context limit and return:
- **HTTP 400** (not 500 — the input is malformed, it's not a server error)
- A message that explains the actual cause, e.g.:
  ```
  Tool call arguments in message history contain invalid JSON (truncated at column 20). 
  This typically happens when a previous response was truncated due to context length limits. 
  Consider reducing conversation history or increasing ctx-size.
  ```

At minimum, even without the smart detection, the error should be a 400 with a message like:
```
Invalid tool call arguments in input messages: JSON parse error at column 20 (unexpected end of input). 
Check that all tool_calls in the conversation history contain valid JSON arguments.
```

## Relevant code

- `common/chat.cpp:1396-1414` — `func_args_not_string()` throws the misleading error
- `tools/server/server-task.cpp:~973` — `to_json_oaicompat_resp()` hardcodes `"status": "completed"`
- `tools/server/server-task.cpp:~1080` — `to_json_oaicompat_resp_stream()` same issue
- `tools/server/server-context.cpp:1250-1253` — truncation sets `STOP_TYPE_LIMIT` but this is not propagated to the Responses API status

## Impact

This causes agentic tool-calling loops (Codex CLI, etc.) to enter infinite retry cycles when context is exhausted, with no way for the client to diagnose or recover from the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Responses API: misleading error on context overflow, must communicate token limit exceeded #19

Bug Description

Reproduction

What happens

Phase 1: Silent truncation

Phase 2: Misleading error on next request

Expected behavior

1. Truncated responses must signal `"status": "incomplete"`

2. Input validation must return a clear, actionable error

Relevant code

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Responses API: misleading error on context overflow, must communicate token limit exceeded #19

Description

Bug Description

Reproduction

What happens

Phase 1: Silent truncation

Phase 2: Misleading error on next request

Expected behavior

1. Truncated responses must signal "status": "incomplete"

2. Input validation must return a clear, actionable error

Relevant code

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Truncated responses must signal `"status": "incomplete"`