Skip to content

Azure OpenAI Responses stalls before first event when memory tools are exposed #80926

@galiniliev

Description

@galiniliev

Summary

Azure OpenAI Responses can return HTTP 200 headers for an OpenClaw embedded run and then never deliver a first parsed Responses event when memory tools are exposed. The same Azure deployment streams normally for direct Responses API calls, and the same OpenClaw run succeeds when the memory tools are excluded from tools.allow.

This is related to the broader Responses stream-stall reports in #76305 and Azure Responses adapter problems in #79570, but the trigger isolated here is the OpenClaw tool surface, specifically memory_get / memory tools.

Environment

  • OpenClaw checkout: 7116cc8a2c plus local debug/fix branch
  • Runtime: Node 24.15.0
  • Provider API: azure-openai-responses
  • Azure API version used by the process: 2025-04-01-preview
  • Azure route shape verified: /openai/responses?api-version=2025-04-01-preview
  • Model/deployment: gpt-5.4-pro

Before proof: memory tool stalls before first event

With memory_get enabled, OpenClaw reaches Azure and receives streaming response headers, but no [responses] first_event log arrives before timeout.

Command shape:

OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_SSE=events \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
AZURE_OPENAI_API_VERSION=2025-04-01-preview \
pnpm openclaw agent --local \
  --session-id devdiv-memoryget-062004 \
  --model azure-openai-responses-devdiv/gpt-5.4-pro \
  --thinking off \
  --timeout 35 \
  --message 'Reply with only: ok' \
  --json

Observed transport lines:

[responses] start provider=azure-openai-responses-devdiv api=azure-openai-responses model=gpt-5.4-pro ... tools=count=16 sample=edit,exec,memory_get,process,read,... reasoningEffort=undefined ...
[model-fetch] start ... method=POST url=https://devdiv-test-playground.openai.azure.com/openai/responses timeoutMs=180000 proxy=none policy=custom
[model-fetch] response ... status=200 elapsedMs=1154 contentType=text/event-stream; charset=utf-8
[responses] headers ... elapsedMs=1175

No [responses] first_event or [responses] stream_done followed before the run was killed/timeout-bound.

A prior full/default tool-surface run showed the same pattern and eventually timed out:

[responses] start ... tools=count=21 sample=cron,edit,exec,image,image_generate,memory_get,memory_search,...
[model-fetch] response ... status=200 ... contentType=text/event-stream; charset=utf-8
[responses] headers ...
[agent/embedded] embedded run timeout ... timeoutMs=90000

Control proof: endpoint and streaming are valid

A direct non-OpenClaw curl call to the same Azure Responses endpoint with stream=true returned SSE immediately, including response.created, response.output_text.delta, and response.completed.

A direct curl call with cache fields, reasoning include, about 30k input chars, and 21 simple function tools also streamed promptly. That points away from endpoint/auth/API-version and toward the OpenClaw tool payload or tool-conditioned prompt surface.

After proof / workaround: exclude memory tools

With memory tools excluded and core/session/web tools allowed, the same OpenClaw model call succeeds:

"tools.allow": [
  "read", "edit", "write", "exec", "process", "update_plan",
  "sessions_list", "sessions_history", "sessions_send", "sessions_spawn",
  "sessions_yield", "subagents", "session_status", "web_search", "web_fetch"
]

Command shape:

OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_SSE=events \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
AZURE_OPENAI_API_VERSION=2025-04-01-preview \
pnpm openclaw agent --local \
  --session-id devdiv-web-060949 \
  --model azure-openai-responses-devdiv/gpt-5.4-pro \
  --thinking off \
  --timeout 45 \
  --message 'Reply with only: ok' \
  --json

Observed success:

[responses] start ... tools=count=15 sample=edit,exec,process,read,session_status,sessions_history,sessions_list,sessions_send,sessions_spawn,sessions_yield,subagents,update_plan ...
[model-fetch] response ... status=200 elapsedMs=2955 contentType=text/event-stream; charset=utf-8
[responses] headers ... elapsedMs=2975
[responses] first_event ... elapsedMs=5 type=response.created
[responses] stream_done ... events=11 ... stopReason=stop contentBlocks=2

The final assistant payload was ok.

Local fix branch evidence

I found one concrete provider-compat problem while isolating this: memory tool corpus schemas used Type.Union([Type.Literal(...)]), which compiles to anyOf. The repo already has a flat stringEnum helper for this class of provider compatibility issue. I patched memory-core to use flat string enums and added a regression test that asserts no anyOf is emitted for those corpus fields.

That cleanup is useful but does not fully resolve this Azure stall: after the enum patch, enabling memory_get still reproduced the pre-first-event stall. The remaining bug is likely either another part of the memory tool schema/description or Azure's handling of this specific memory tool surface.

Verification run locally

pnpm test extensions/memory-core/src/tools.test.ts -- --reporter=verbose
# 1 file passed, 7 tests passed

pnpm build
# passed

pnpm openclaw config validate
# Config valid: ~/.openclaw/openclaw.json

Expected behavior

Azure OpenAI Responses should either stream a first event promptly or fail with a bounded, actionable error when a provider cannot handle a tool payload. It should not leave the embedded run in model_call:started until timeout after headers have already arrived.

Metadata

Metadata

Assignees

No one assigned

    Labels

    maintainerMaintainer-authored PR

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions