fix(agent): propagate ContextVars to concurrent tool worker threads#16660
fix(agent): propagate ContextVars to concurrent tool worker threads#16660banditburai wants to merge 2 commits into
Conversation
`_execute_tool_calls_concurrent` submits tools via `executor.submit(_run_tool, ...)`
without `copy_context().run`, so worker threads do not inherit the parent's
ContextVar values — including `_approval_session_key` set by the gateway before
`agent.run`. Worker tools fall through `tools/approval.py:get_current_session_key`'s
resolution order to the `os.environ` fallback ("default" session key), silently
collapsing per-session dispatch for any tool that runs on a worker thread.
Fix: wrap the submitted callable in `contextvars.copy_context().run`, mirroring
`asyncio.to_thread`'s implementation. The existing threading.local callback
propagation (0046d17 / GHSA-qg5c-hvr5-hjgr) is preserved unchanged — it
handles a different propagation surface that ContextVars cannot carry.
|
Confirming a real-world reproduction on Setup: Two concurrent gateway sessions on Slack, separate channels A and B, with different active users in each channel. Both had registered approval notify callbacks via Trigger: Session A's agent decided to run a recursive delete, which fired the dangerous-command approval flow. Observed: the approval card (with Allow Once / Allow Session / Deny buttons) was delivered to channel B, not channel A. The user in channel B saw an approval prompt for a command they had no context for, while session A's agent thread was blocked waiting for a response that would never come from there. Matches this PR's analysis exactly. The tool ran in a worker thread spawned without P1 feels right — the misroute happens at the worst possible moment (dangerous-command confirmation), and the wrong user could plausibly click "Allow Once" thinking it's their own session. Happy to retest once this is merged. |
Five tests under tests/eval_018/ that go from FAIL → PASS when an
agent successfully completes one of the five prompts authored in
talaria-evals/prompts/edit_*.{md,yaml}:
test_file_tools_typed tools/file_tools.py — public-fn annotations
test_cli_errors cli.py — CliError subclass + ≥3 raise sites
test_rename_compactor agent/* — ContextCompressor → ContextCompactor
test_usage_pricing_docstrings agent/usage_pricing.py — class docstrings
test_title_timeout agent/title_generator.py — timeout default 60.0
Plus conftest.py with an autouse AnchorStateManager.reset() (defensive
even though each eval run is a subprocess) and an empty __init__.py.
All five tests fail at this commit by design — they are the assertion
side of the eval; agents earn the pass by editing the named files.
…18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages #16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes #16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
|
Salvaged as #18123, merged as e5dad4ac5. The core 4-LOC fix was correct — diagnosis of the Dropped the Added a regression suite: Thanks for finding this — the cross-session approval misroute is a real P1 and your diagnosis pointed straight at the fix. |
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
…ousResearch#18123) Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes NousResearch#16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>
Companion to 0046d17 (threading.local callback propagation) and 603a68b (per-session ContextVar isolation).
_execute_tool_calls_concurrentsubmits tools viaexecutor.submit(_run_tool, ...)withoutcopy_context().run, so worker threads do not inherit the parent's ContextVar values — including_approval_session_keyset by the gateway beforeagent.run. Worker tools fall throughtools/approval.py:get_current_session_key's resolution order to theos.environfallback ("default"session key), silently collapsing per-session dispatch.Fix: wrap
_run_toolincopy_context().runat submit time, mirroringasyncio.to_thread. The existing threading.local callback propagation atrun_agent.py:8796-8806(from 0046d17 / GHSA-qg5c-hvr5-hjgr) is preserved — it handles a different propagation surface that ContextVars cannot carry.