Empty claude-cli subprocess responses misclassified as billing cooldown

## Version

OpenClaw `2026.5.12` (build `f066dd2`), Node `claude-cli` provider, Linux.

## Summary

When the bundled Claude CLI subprocess returns a zero-token, no-text completion (no error, no abort, no timeout), the provider classifier records it as a `billing` failure in `~/.openclaw/agents/main/agent/auth-state.json`. Three such responses trip a cooldown on the profile, after which **every** subsequent run on that profile aborts in ~300 ms with no model call and no trajectory file.

This is not a real billing/wallet condition — the user's Claude account is funded and other clients on the same account succeed.

## Reproduction signature

After a run completes with `status: "success"` but an empty assistant reply, `auth-state.json` shows:

```json
"usageStats": {
  "anthropic:claude-cli": {
    "errorCount": 1,
    "failureCounts": { "billing": 1 },
    "lastFailureAt": <ts>
  }
}
```

The corresponding trajectory's `model.completed` event has:

```
lastCallUsage: { input_tokens: 0, output_tokens: 0, total_tokens: 0 }
assistantTexts: []
aborted: false
timedOut: false
promptErrorSource: null
```

After three such events, the profile is cooled down:

```json
"disabledUntil": <future-ts>,
"disabledReason": "billing",
"errorCount": 3,
"failureCounts": { "billing": 3 }
```

Subsequent runs fail in ~300 ms with:

```
FallbackSummaryError: All models failed (N):
  claude-cli/claude-sonnet-4-6: Provider claude-cli is in cooldown (suspending lanes) (billing)
  | anthropic/claude-haiku-4-5-...: Provider anthropic is in cooldown (suspending lanes) (billing)
```

and no `<sessionId>.jsonl` / `.trajectory.jsonl` is created — the run aborts before any model call.

## Expected

A zero-token, no-text, no-error response from the Claude CLI subprocess should not be classified as a billing failure. Either:

1. Treat empty completions as a distinct failure mode (e.g. `empty-response`) and apply a separate cooldown policy, or
2. Don't increment any failure counter for empty responses (treat as a no-op / retry candidate), or
3. Inspect the CLI exit code and stderr before classifying — an empty stdout with exit 0 is not the same as a billing rejection.

## Actual

The classifier maps the empty response to `billing`, the cooldown trips after 3 consecutive empty responses (which happen organically during normal Discord/cron load), and all dependent jobs fail until the cooldown window elapses or the state file is manually edited.

## Workaround attempted

Adding a systemd `ExecStartPre` that clears `disabledUntil` / `disabledReason` / `errorCount` / `failureCounts` from `auth-state.json` on gateway start works at startup, but is not sufficient — during a 16-minute window between gateway restart and a manual cron trigger, two ordinary Discord/cron sessions re-tripped `failureCounts.billing` to 3.

`ExecStartPre` snippet (for reference):

```ini
ExecStartPre=/usr/bin/python3 -c "import json,os;f=os.path.expanduser('~/.openclaw/agents/main/agent/auth-state.json');d=json.load(open(f));[s.pop('disabledUntil',None) or s.pop('disabledReason',None) or s.update(errorCount=0,failureCounts={}) for s in d.get('usageStats',{}).values()];json.dump(d,open(f,'w'),indent=2)"
```

## Suggested fix locations

The classifier path that maps subprocess result → failure category should distinguish between:

- non-zero exit / stderr containing a billing-shaped signal → `billing` (current behaviour, correct)
- exit 0 with empty stdout / `usage.total_tokens == 0` and `assistantTexts == []` → **new bucket** (or no-op)

## Impact

Every cron job and Discord session that depends on `anthropic:claude-cli` becomes unreliable after ~3 empty-response coincidences. Operators see this as "claude-cli is dead" or "billing issue" with no actual billing problem. The cooldown is invisible from `openclaw health` (it shows `Discord: configured` and gateway healthy) and only visible by reading `auth-state.json` directly or seeing the `FallbackSummaryError` in `openclaw cron runs`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Empty claude-cli subprocess responses misclassified as billing cooldown #83231

Version

Summary

Reproduction signature

Expected

Actual

Workaround attempted

Suggested fix locations

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Empty claude-cli subprocess responses misclassified as billing cooldown #83231

Description

Version

Summary

Reproduction signature

Expected

Actual

Workaround attempted

Suggested fix locations

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions