Skip to content

fix: keep a strong reference to the websocket stream task in autogen-studio#7825

Open
kratos0718 wants to merge 1 commit into
microsoft:mainfrom
kratos0718:fix/ws-stream-task-reference
Open

fix: keep a strong reference to the websocket stream task in autogen-studio#7825
kratos0718 wants to merge 1 commit into
microsoft:mainfrom
kratos0718:fix/ws-stream-task-reference

Conversation

@kratos0718

Copy link
Copy Markdown

Why are these changes needed?

The run websocket handler in autogen-studio (web/routes/ws.py) starts the stream with a bare asyncio.create_task(...) and discards the returned task:

# Start the stream in a separate task
asyncio.create_task(ws_manager.start_stream(run_id, task, team_config))

Per the asyncio docs, the event loop only keeps a weak reference to a task. A task with no other reference can be garbage-collected before it finishes — here that means a user's streaming run can be silently dropped mid-flight under GC pressure, with no error surfaced.

Fix

Keep a strong reference to the task in a module-level set and remove it on completion via add_done_callback (the pattern recommended in the asyncio docs):

stream_task = asyncio.create_task(ws_manager.start_stream(run_id, task, team_config))
_background_tasks.add(stream_task)
stream_task.add_done_callback(_background_tasks.discard)

No behavior change beyond ensuring the stream task is retained until it completes. Single file.

Related issue number

N/A — small self-contained reliability fix.

Checks

  • I've made sure the change compiles (python -m py_compile).
  • The change is minimal and focused on one issue.

The run websocket handler starts ws_manager.start_stream() with a bare
asyncio.create_task() and discards the returned task. The event loop only
keeps a weak reference to a task, so it can be garbage-collected before
start_stream finishes — silently dropping a user's run mid-stream.

Track the task in a module-level set and remove it on completion via
add_done_callback so a strong reference is held until it finishes.

@avinashkamat48-design avinashkamat48-design left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a strong reference fixes the GC risk, but this still drops task failures silently. If ws_manager.start_stream(...) raises, the done callback only discards the task from _background_tasks; nobody retrieves task.exception(), logs it, or notifies the websocket/client. Could the callback inspect/log failures so background stream errors do not become invisible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants