Azure OpenAI Responses stalls before first event when memory tools are exposed

## Summary

Azure OpenAI Responses can return HTTP 200 headers for an OpenClaw embedded run and then never deliver a first parsed Responses event when memory tools are exposed. The same Azure deployment streams normally for direct Responses API calls, and the same OpenClaw run succeeds when the memory tools are excluded from `tools.allow`.

This is related to the broader Responses stream-stall reports in #76305 and Azure Responses adapter problems in #79570, but the trigger isolated here is the OpenClaw tool surface, specifically `memory_get` / memory tools.

## Environment

- OpenClaw checkout: `7116cc8a2c` plus local debug/fix branch
- Runtime: Node 24.15.0
- Provider API: `azure-openai-responses`
- Azure API version used by the process: `2025-04-01-preview`
- Azure route shape verified: `/openai/responses?api-version=2025-04-01-preview`
- Model/deployment: `gpt-5.4-pro`

## Before proof: memory tool stalls before first event

With `memory_get` enabled, OpenClaw reaches Azure and receives streaming response headers, but no `[responses] first_event` log arrives before timeout.

Command shape:

```bash
OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_SSE=events \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
AZURE_OPENAI_API_VERSION=2025-04-01-preview \
pnpm openclaw agent --local \
  --session-id devdiv-memoryget-062004 \
  --model azure-openai-responses-devdiv/gpt-5.4-pro \
  --thinking off \
  --timeout 35 \
  --message 'Reply with only: ok' \
  --json
```

Observed transport lines:

```text
[responses] start provider=azure-openai-responses-devdiv api=azure-openai-responses model=gpt-5.4-pro ... tools=count=16 sample=edit,exec,memory_get,process,read,... reasoningEffort=undefined ...
[model-fetch] start ... method=POST url=https://devdiv-test-playground.openai.azure.com/openai/responses timeoutMs=180000 proxy=none policy=custom
[model-fetch] response ... status=200 elapsedMs=1154 contentType=text/event-stream; charset=utf-8
[responses] headers ... elapsedMs=1175
```

No `[responses] first_event` or `[responses] stream_done` followed before the run was killed/timeout-bound.

A prior full/default tool-surface run showed the same pattern and eventually timed out:

```text
[responses] start ... tools=count=21 sample=cron,edit,exec,image,image_generate,memory_get,memory_search,...
[model-fetch] response ... status=200 ... contentType=text/event-stream; charset=utf-8
[responses] headers ...
[agent/embedded] embedded run timeout ... timeoutMs=90000
```

## Control proof: endpoint and streaming are valid

A direct non-OpenClaw curl call to the same Azure Responses endpoint with `stream=true` returned SSE immediately, including `response.created`, `response.output_text.delta`, and `response.completed`.

A direct curl call with cache fields, reasoning include, about 30k input chars, and 21 simple function tools also streamed promptly. That points away from endpoint/auth/API-version and toward the OpenClaw tool payload or tool-conditioned prompt surface.

## After proof / workaround: exclude memory tools

With memory tools excluded and core/session/web tools allowed, the same OpenClaw model call succeeds:

```json
"tools.allow": [
  "read", "edit", "write", "exec", "process", "update_plan",
  "sessions_list", "sessions_history", "sessions_send", "sessions_spawn",
  "sessions_yield", "subagents", "session_status", "web_search", "web_fetch"
]
```

Command shape:

```bash
OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_SSE=events \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
AZURE_OPENAI_API_VERSION=2025-04-01-preview \
pnpm openclaw agent --local \
  --session-id devdiv-web-060949 \
  --model azure-openai-responses-devdiv/gpt-5.4-pro \
  --thinking off \
  --timeout 45 \
  --message 'Reply with only: ok' \
  --json
```

Observed success:

```text
[responses] start ... tools=count=15 sample=edit,exec,process,read,session_status,sessions_history,sessions_list,sessions_send,sessions_spawn,sessions_yield,subagents,update_plan ...
[model-fetch] response ... status=200 elapsedMs=2955 contentType=text/event-stream; charset=utf-8
[responses] headers ... elapsedMs=2975
[responses] first_event ... elapsedMs=5 type=response.created
[responses] stream_done ... events=11 ... stopReason=stop contentBlocks=2
```

The final assistant payload was `ok`.

## Local fix branch evidence

I found one concrete provider-compat problem while isolating this: memory tool `corpus` schemas used `Type.Union([Type.Literal(...)])`, which compiles to `anyOf`. The repo already has a flat `stringEnum` helper for this class of provider compatibility issue. I patched memory-core to use flat string enums and added a regression test that asserts no `anyOf` is emitted for those corpus fields.

That cleanup is useful but does not fully resolve this Azure stall: after the enum patch, enabling `memory_get` still reproduced the pre-first-event stall. The remaining bug is likely either another part of the memory tool schema/description or Azure's handling of this specific memory tool surface.

## Verification run locally

```text
pnpm test extensions/memory-core/src/tools.test.ts -- --reporter=verbose
# 1 file passed, 7 tests passed

pnpm build
# passed

pnpm openclaw config validate
# Config valid: ~/.openclaw/openclaw.json
```

## Expected behavior

Azure OpenAI Responses should either stream a first event promptly or fail with a bounded, actionable error when a provider cannot handle a tool payload. It should not leave the embedded run in `model_call:started` until timeout after headers have already arrived.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Azure OpenAI Responses stalls before first event when memory tools are exposed #80926

Summary

Environment

Before proof: memory tool stalls before first event

Control proof: endpoint and streaming are valid

After proof / workaround: exclude memory tools

Local fix branch evidence

Verification run locally

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Azure OpenAI Responses stalls before first event when memory tools are exposed #80926

Description

Summary

Environment

Before proof: memory tool stalls before first event

Control proof: endpoint and streaming are valid

After proof / workaround: exclude memory tools

Local fix branch evidence

Verification run locally

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions