Skip to content

fix: repetitive tracebacks after "socket.send() raised exception"#11284

Merged
timoffex merged 1 commit intomainfrom
timoffex/02-06-socket_send_exception
Feb 7, 2026
Merged

fix: repetitive tracebacks after "socket.send() raised exception"#11284
timoffex merged 1 commit intomainfrom
timoffex/02-06-socket_send_exception

Conversation

@timoffex
Copy link
Copy Markdown
Contributor

@timoffex timoffex commented Feb 6, 2026

Fixes huge tracebacks output after a wandb-core crash.

Fixes WB-31072.

The tracebacks have long repeated sections because asyncio's StreamReader/StreamWriter store and re-raise an exception. When wandb-core crashes, the next communication attempt raises a ConnectionResetError. This usually triggers error handling logic that attempts to send even more data to wandb-core as it cannot distinguish between a wandb-core crash and other types of errors. Each time the exception is raised, the traceback is lengthened.

I'm not sure how exactly, but this can easily result in extremely long tracebacks (megabytes of text). Since each raise statement during stack unwinding mutates an exception's traceback, the traceback is lengthened by the entire callstack leading up to the raise saved_exception statement each time it's triggered.

This problem may be fixed in Python >= 3.11, but I haven't tested. See https://bugs.python.org/issue45924.

Testing

To reproduce the issue, print and call run.log() in a loop, and then pkill wandb-core. Without the fix, this outputs many socket.send() raised exception lines and exception tracebacks with this repeated section:

  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/asyncio_manager.py", line 181, in fn_wrap_exceptions
    await fn()
  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/service/service_client.py", line 38, in publish
    await self._send_server_request(request)
  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/service/service_client.py", line 64, in _send_server_request
    await self._writer.drain()
  File "/Users/timoffex/.local/share/uv/python/cpython-3.10.16-macos-aarch64-none/lib/python3.10/asyncio/streams.py", line 359, in drain
    raise exc
  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/asyncio_manager.py", line 181, in fn_wrap_exceptions
    await fn()
  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/service/service_client.py", line 38, in publish
    await self._send_server_request(request)
  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/service/service_client.py", line 64, in _send_server_request
    await self._writer.drain()
  File "/Users/timoffex/.local/share/uv/python/cpython-3.10.16-macos-aarch64-none/lib/python3.10/asyncio/streams.py", line 359, in drain
    raise exc
  File "/Users/timoffex/Documents/workspace/wandb/wandb/sdk/lib/asyncio_manager.py", line 181, in fn_wrap_exceptions
    await fn()

@timoffex timoffex changed the title socket send exception fix: repetitive "socket.send() Feb 6, 2026
Copy link
Copy Markdown
Contributor Author

timoffex commented Feb 6, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@timoffex timoffex changed the title fix: repetitive "socket.send() fix: repetitive tracebacks after "socket.send() raised exception" Feb 6, 2026
@timoffex timoffex force-pushed the timoffex/02-06-socket_send_exception branch from a37c5ac to b52538a Compare February 6, 2026 20:52
@timoffex timoffex marked this pull request as ready for review February 6, 2026 20:55
@timoffex timoffex requested a review from a team as a code owner February 6, 2026 20:55
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
wandb/sdk/lib/service/service_client.py 75.00% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@timoffex timoffex force-pushed the timoffex/02-06-socket_send_exception branch from b52538a to 96cc862 Compare February 6, 2026 23:42
timoffex added a commit that referenced this pull request Feb 7, 2026
)

To reraise a saved exception, it's important to reset its traceback.

I haven't heard of anyone ever hitting this particular exception, but it's good to use the correct patterns.

See PR #11284.
@timoffex timoffex force-pushed the timoffex/02-06-socket_send_exception branch 3 times, most recently from b695f5f to ffaff72 Compare February 7, 2026 01:24
Copy link
Copy Markdown
Contributor Author

timoffex commented Feb 7, 2026

Merge activity

  • Feb 7, 1:25 AM UTC: Graphite rebased this pull request as part of a merge.
  • Feb 7, 1:27 AM UTC: Graphite rebased this pull request as part of a merge.
  • Feb 7, 1:39 AM UTC: @timoffex merged this pull request with Graphite.

@timoffex timoffex force-pushed the timoffex/02-06-socket_send_exception branch from ffaff72 to 1e36296 Compare February 7, 2026 01:26
@timoffex timoffex merged commit 9d33294 into main Feb 7, 2026
33 of 34 checks passed
@timoffex timoffex deleted the timoffex/02-06-socket_send_exception branch February 7, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants