fix: resolve 5 bugs — broadcast exclusion, plans cleanup, launch perms, config, fetch race#15
fix: resolve 5 bugs — broadcast exclusion, plans cleanup, launch perms, config, fetch race#15win4r wants to merge 1 commit intoHKUDS:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40241bfcf3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| member_names = {m.name for m in config.members} | ||
| for f in plans_dir.glob("*.md"): | ||
| try: | ||
| f.unlink() | ||
| except OSError: | ||
| pass | ||
| # Plan files are named {agent_name}-{plan_id}.md | ||
| # Extract agent name (everything before the last dash + 12-char hex id) | ||
| stem = f.stem | ||
| parts = stem.rsplit("-", 1) | ||
| if len(parts) == 2 and parts[0] in member_names: |
There was a problem hiding this comment.
Keep plan cleanup isolated to the current team
Because PlanManager.submit_plan() stores plans as {agent_name}-{plan_id}.md with no team prefix (clawteam/team/plan.py), this still deletes other teams' plans whenever they reuse the same member names. For example, two hedge-fund teams both produce risk-manager-*.md; running team cleanup on one team will unlink the other team's risk-manager plans as well, so the shared plans/ directory is still not actually scoped per team.
Useful? React with 👍 / 👎.
| consumed = f.with_suffix(".consumed") | ||
| try: | ||
| f.unlink() | ||
| f.rename(consumed) | ||
| except OSError: |
There was a problem hiding this comment.
Do not acknowledge file messages before they are returned
Renaming the inbox file to .consumed here makes the message disappear from every later fetch()/count() call before it has actually been handed back to the caller. If the worker is interrupted after rename() succeeds but before messages.append(raw) completes, the stranded .consumed file is never retried because the transport only scans msg-*.json, so that message is lost permanently instead of being redelivered.
Useful? React with 👍 / 👎.
Combines the best of both open PRs, resolving their overlapping approaches and applying all 6 fixes that are not yet in main: **mailbox.py** - broadcast(): resolve logical agent names to inbox directory names before exclusion check, so the sender is not delivered their own broadcast when inboxes use user-prefixed format (e.g. "alice_worker" vs "worker") [from PR HKUDS#15] - receive() / peek(): replace bare list comprehension with _parse_messages() helper that wraps each message in try/except, so a single corrupted file no longer crashes the entire fetch [from PR HKUDS#9] **transport/file.py** - fetch(consume=True): atomically rename msg file to .consumed before reading, so concurrent callers cannot both receive the same message [from PR HKUDS#15] **cli/commands.py** - config set / config get: derive valid_keys from ClawTeamConfig.model_fields instead of a hardcoded 3-item set, making all 7 config keys accessible; handle boolean fields (skip_permissions) correctly [from PR HKUDS#15] - approve-join: return an error and exit(1) when no matching join request is found, instead of silently fabricating an agent name [from PR HKUDS#9] - launch_team: read skip_permissions from config and pass it to be.spawn(), matching the behaviour of the standalone spawn command [from PR HKUDS#15] Note: the plans-cleanup fix (PR HKUDS#9 subdirectory approach, PR HKUDS#15 member-filter approach) was already merged via PR HKUDS#6 using the subdirectory strategy, so it is intentionally excluded here. All 135 existing tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…issions, config set, fetch race 1. **broadcast exclude logic** (mailbox.py): Sender was never excluded from broadcast because `list_recipients()` returns inbox directory names (e.g. "alice_worker") while the exclude set contains logical agent names (e.g. "worker"). Now resolves names to inbox names before comparison. 2. **plans cleanup deletes all teams** (manager.py): `cleanup()` deleted ALL `.md` files in the shared plans directory instead of only files belonging to the team being cleaned up. Now loads team config first and only deletes plan files matching the team's member names. 3. **launch missing skip_permissions** (commands.py): Agents spawned via `clawteam launch` templates never received `--dangerously-skip-permissions` even when config had `skip_permissions: true`. Now reads config and passes the flag to `be.spawn()`. 4. **config set/get only accepts 3 of 7 keys** (commands.py): `config set` and `config get` hardcoded only `data_dir`, `user`, `default_team` as valid keys, making it impossible to configure `transport`, `workspace`, `default_backend`, or `skip_permissions` via CLI. Now dynamically reads valid keys from the Pydantic model, with proper boolean handling. 5. **file transport fetch race condition** (file.py): Concurrent `fetch()` calls could read the same message file before either process deleted it, causing duplicate message delivery. Now atomically renames the file to `.consumed` before reading, so only one process can claim each message. All 124 existing tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
40241bf to
b226347
Compare
Thanks for the fixes here. I’m closing this PR because the useful parts have now been absorbed into In particular, the following issues covered here have already been fixed upstream:
So this PR is now effectively superseded by later changes, rather than something I want to merge directly. Thanks again — the diagnosis and patch direction were useful. |
Summary
Fixes 5 independent bugs discovered during deep testing of the project.
1. Broadcast exclude logic does not work (mailbox.py)
broadcast()compares inbox directory names (e.g.alice_worker) against logical agent names (e.g.worker) in the exclude set. Since these never match, the sender always receives their own broadcast. Fixed by resolving logical names to inbox names before comparison.2.
cleanupdeletes ALL teams' plan files (manager.py)TeamManager.cleanup()calledplans_dir.glob("*.md")and deleted everything in the sharedplans/directory — wiping plans from every team, not just the one being cleaned up. Fixed by loading the team config first and only deleting plan files whose agent name prefix matches a team member.3.
launchdoes not passskip_permissionsto spawned agents (commands.py)The
spawncommand correctly readsskip_permissionsfrom config and passes it to backends. Thelaunchcommand did not, so agents spawned via templates (e.g.clawteam launch hedge-fund) never got--dangerously-skip-permissionseven when configured. Fixed by reading the config value and forwarding it.4.
config set/config getonly accept 3 of 7 valid keys (commands.py)Both commands hardcoded
{"data_dir", "user", "default_team"}as valid keys, makingtransport,workspace,default_backend, andskip_permissionsimpossible to set via CLI (even thoughconfig showdisplays all 7). Fixed by dynamically reading valid keys from the Pydantic model, with proper boolean field handling.5. File transport
fetch()has a race condition (file.py)When multiple processes call
fetch(consume=True)concurrently, both canread_bytes()the same message file before either callsunlink(), resulting in duplicate message delivery. Fixed by atomically renaming the file to.consumedbefore reading — only one process can win therename(), and losers skip the message.Changes
clawteam/team/mailbox.pyclawteam/team/manager.pyclawteam/cli/commands.pyclawteam/transport/file.pyTest plan
pytest tests/ -v)ruff check— no new lint errors (pre-existingN802ondo_GETis unchanged)config set transport filenow worksconfig set skip_permissions truehandles booleans correctly🤖 Generated with Claude Code