Skip to content

fix(agents): bypass loadWebMedia for media-uri refs to prevent WSL2 keyring deadlock (#59104)#60278

Closed
Syysean wants to merge 99 commits into
openclaw:mainfrom
Syysean:fix-media-uri-keyring-deadlock
Closed

fix(agents): bypass loadWebMedia for media-uri refs to prevent WSL2 keyring deadlock (#59104)#60278
Syysean wants to merge 99 commits into
openclaw:mainfrom
Syysean:fix-media-uri-keyring-deadlock

Conversation

@Syysean

@Syysean Syysean commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: loadImageFromRef passed a local filesystem path (returned by resolveMediaBufferPath) to loadWebMedia for media-uri refs.
  • Why it matters: On WSL2 / Ubuntu 24.04, loadWebMedia launches Chromium, which attempts to access GNOME Keyring via libsecret. This causes a synchronous block, making the Gateway hang silently with no logs, no errors, and no response to SIGTERM.
  • What changed: Bypassed the Chromium browser stack completely for media-uri refs. Replaced loadWebMedia(physicalPath) with a direct fs.readFile(physicalPath) call.
  • What did NOT change (scope boundary): The handling of standard remote web URLs and inline image references remains untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause / Regression History (if applicable)

  • Root cause: Attempting to use a headless browser stack (Chromium) to load an already-validated and persisted local media file.
  • Missing detection / guardrail: Missing differentiation in the loading strategy between remote URLs (which need browser rendering/fetching) and local Gateway-managed files.
  • Prior context (git blame, prior PR, issue, or refactor if known): Unknown
  • Why this regressed now: Unknown
  • If unknown, what was ruled out: N/A

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/pi-embedded-runner/run/images.test.ts (or equivalent)
  • Scenario the test should lock in: Assert that providing a media-uri ref calls fs.readFile and strictly does NOT invoke loadWebMedia.
  • Why this is the smallest reliable guardrail: It prevents the heavy browser stack from being invoked for local files, which is the direct trigger for the OS-level keyring deadlock.
  • Existing test that already covers this (if any): None
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

None. The Gateway will now successfully process images on WSL2 instead of silently crashing.

Diagram (if applicable)

Before:
[media-uri ref] -> [loadWebMedia(Chromium)] -> [libsecret / GNOME Keyring] -> [Deadlock / Gateway Hangs]

After:
[media-uri ref] -> [fs.readFile] -> [Base64 Image Data Returned]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: WSL2 / Ubuntu 24.04
  • Runtime/container: Node.js (Local)
  • Model/provider: Any vision-capable model
  • Integration/channel (if any): WebUI / Chat
  • Relevant config (redacted): N/A

Steps

  1. Run Gateway on WSL2 (Ubuntu 24.04).
  2. Send a large image attachment (e.g., >2MB) in the chat to trigger the claim-check offload path (media-uri).
  3. Observe Gateway behavior.

Expected

  • The image is loaded, processed, and the model responds normally.

Actual

  • Before this PR: Gateway hangs silently, consumes CPU/RAM without returning, ignores SIGTERM.
  • After this PR: Gateway processes the image instantly via local filesystem read.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: Reproduced the deadlock on WSL2 with large images. Applied the fix and verified the image loads successfully via fs.readFile without launching Chromium.
  • Edge cases checked: Verified small images (inline paths) are unaffected. Verified MIME type is correctly inferred from the persisted file extension.
  • What you did not verify: Did not test on macOS/native Windows, as the keyring deadlock is specific to Linux/WSL2 desktop environments.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: MIME type inference now relies on the local file extension rather than browser headers.
    • Mitigation: The Gateway's saveMediaBuffer already strictly validates and assigns the correct extension during the initial offload/persist phase, making the file extension a reliable source of truth.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Apr 3, 2026
@greptile-apps

greptile-apps Bot commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a real WSL2/Ubuntu 24.04 regression where sending large image attachments caused the Gateway to deadlock silently. The root cause was loadImageFromRef routing already-persisted local media-uri files through loadWebMedia, which launches Chromium and triggers a synchronous libsecret/GNOME Keyring call on Linux desktop environments. The fix correctly short-circuits to a direct fs.readFile for media-uri refs, relying on resolveMediaBufferPath's existing path-traversal guards for safety.

Key concerns found during review:

  • Indentation is broken throughout the new if (ref.type === "media-uri") block — body lines sit at 2 spaces (the same column as the if keyword) and the closing } is at 0 spaces. Braces still match syntactically so there is no runtime impact, but this will fail pnpm format (Oxfmt).
  • MEDIA_EXT_TO_MIME is defined inline inside a hot try block, re-allocating an object on every media-uri image load. It should be a module-level constant alongside IMAGE_EXTENSION_NAMES. It is also missing entries for ".bmp", ".tiff", and ".tif", which IMAGE_EXTENSION_NAMES does list — files stored with those extensions would be returned with an incorrect image/jpeg MIME type.
  • No regression test was added, despite the PR's own regression test plan explicitly naming a target file and scenario. Without a test asserting fs.readFile is called and loadWebMedia is not, a future refactor could silently reintroduce the WSL2 deadlock.

Confidence Score: 3/5

The fix is logically correct and path-safety guards are preserved, but it will fail the formatting gate and is missing a regression test.

The core approach is sound — resolveMediaBufferPath already enforces path-traversal/symlink guards, so switching to fs.readFile is safe. The runtime behavior for the common case (jpg/png/webp) is correct. The score is lowered because the misindented block will break pnpm format, the inline MIME table has missing entries for valid stored formats, and the PR's own stated regression test was not delivered.

src/agents/pi-embedded-runner/run/images.ts — indentation fix and module-level constant extraction needed before merging.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/images.ts
Line: 356-392

Comment:
**Broken indentation throughout the new block**

The entire `if (ref.type === "media-uri")` block has been de-indented by one level: the body sits at 2 spaces (the same column as the `if` keyword itself) and the closing `}` is at 0 spaces instead of 2. Semantically the braces still match, so this is not a runtime bug, but it will fail the Oxfmt formatting gate (`pnpm format`) and makes the block look structurally incorrect to any reader.

The body lines (e.g. `const uriMatch`, `if (!uriMatch)`, `try { … }`) need to be indented to 4 spaces (the normal function-body + one block level), and the closing `}` needs to be at 2 spaces, consistent with the surrounding `if (ref.type === "media-uri") {`.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/images.ts
Line: 372-380

Comment:
**Inline constant should be module-level**

`MEDIA_EXT_TO_MIME` is an immutable lookup table, but it is currently declared inside the `try` block and re-allocated on every `loadImageFromRef` call for `media-uri` refs. It should be moved to module scope (alongside the existing `IMAGE_EXTENSION_NAMES` constant at the top of the file) so it is only created once.

Additionally, the table is missing MIME mappings for formats that `IMAGE_EXTENSION_NAMES` lists at the top of this file: `".bmp"`, `".tiff"`, and `".tif"`. Any file stored with those extensions would fall through to the `"image/jpeg"` default, producing an incorrect MIME type declaration for the base64 data sent to the model.

Suggested module-level constant:
```ts
const MEDIA_EXT_TO_MIME: Record<string, string> = {
  ".jpg": "image/jpeg",
  ".jpeg": "image/jpeg",
  ".png": "image/png",
  ".webp": "image/webp",
  ".gif": "image/gif",
  ".heic": "image/heic",
  ".heif": "image/heif",
  ".bmp": "image/bmp",
  ".tiff": "image/tiff",
  ".tif": "image/tiff",
};
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/images.ts
Line: 364-385

Comment:
**No test added despite explicit regression test plan**

The PR description's regression test plan names a concrete target file (`src/agents/pi-embedded-runner/run/images.test.ts`) and a specific scenario: assert that a `media-uri` ref calls `fs.readFile` and does **not** invoke `loadWebMedia`. The "If no new test is added, why not" field is answered "N/A", implying a test is expected but none was submitted.

Without this test, a future refactor that accidentally re-routes `media-uri` refs through `loadWebMedia` would silently reintroduce the WSL2 deadlock. Adding even a minimal unit test that stubs `fs.readFile` and asserts `loadWebMedia` is never called would lock in the fix and satisfy the stated regression plan.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(agents): use fs.readFile for media-u..." | Re-trigger Greptile

Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2269ba69c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a02e2c0d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.test.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dc97da32c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.test.ts
@Syysean

Syysean commented Apr 3, 2026

Copy link
Copy Markdown
Contributor Author

Hi maintainers 👋

The core fix for the WSL2 keyring deadlock is ready for review.

I noticed a few CI workflows failed, but after checking the logs, they appear to be unrelated upstream infrastructure issues or flaky tests:

  • checks-fast-extensions: Failed with a Node.js OOM error (FATAL ERROR: JavaScript heap out of memory).
  • checks-node-test: Failed due to a 120s timeout in the microsoft-foundry Entra token refresh tests.
  • checks-windows-node-test: Failed due to a 120s timeout in the cron job isolated execution tests.

Since my PR only touches the Chromium launch arguments and adds a watchdog for the attachment pipeline, these failures seem completely unrelated to my changes.

Could someone please help re-trigger the failed CI jobs, or review the code as-is? Thank you!

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79b569d130

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55dbcd2c6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 945b66dfa1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d049b7b17c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +376 to +377
const physicalPath = await resolveMediaBufferPath(mediaId, "inbound");
const buffer = await fs.readFile(physicalPath);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep media-uri reads on the safe-open path

Reading physicalPath with fs.readFile drops the race-resistant file checks that were previously enforced through loadWebMedia (readLocalFileSafely uses O_NOFOLLOW plus identity checks). Here, resolveMediaBufferPath only validates via lstat before returning the path, so a writer that can mutate the media directory between that check and readFile can swap in a symlink/other file and have arbitrary bytes ingested as an image. This is a security regression from the prior implementation and should use the same safe-open read path when bypassing Chromium.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The media directory is restricted to 0o700 and resolveMediaBufferPath already rejects symlinks via lstat. The TOCTOU window exists in theory but is not exploitable in this deployment context. Can treat as a future hardening item.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab9636e0ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5bf1179c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f55eff8bf6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/images.ts Outdated
@Syysean Syysean closed this Apr 8, 2026
@Syysean

Syysean commented Apr 8, 2026

Copy link
Copy Markdown
Contributor Author

Closing this PR after identifying the root cause.

This issue is not just about timeout or connection handling.
There is a deeper architectural problem in the CDP connection chain:

  • Chromium is not exposing CDP on the expected port
  • A socat proxy layer is introduced to bridge the mismatch
  • This adds unnecessary complexity and instability

This PR attempts to fix the symptoms (e.g. timeout / connection issues),
but does not address the root cause.

To ensure a clean and maintainable solution, I will split the fix into smaller PRs:

  1. Remove socat and unify the CDP port (direct connection)
  2. Stabilize Chromium launch flags
  3. Improve sandbox security configuration
  4. Add watchdog / recovery mechanism

Closing this PR in favor of a more structured approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inbound image processing hangs silently in WSL2 due to GNOME Keyring blocking

5 participants