Skip to content

fix: resolve 5 bugs — broadcast exclusion, plans cleanup, launch perms, config, fetch race#15

Closed
win4r wants to merge 1 commit intoHKUDS:mainfrom
win4r:fix/multiple-bugs
Closed

fix: resolve 5 bugs — broadcast exclusion, plans cleanup, launch perms, config, fetch race#15
win4r wants to merge 1 commit intoHKUDS:mainfrom
win4r:fix/multiple-bugs

Conversation

@win4r
Copy link
Copy Markdown
Contributor

@win4r win4r commented Mar 18, 2026

Summary

Fixes 5 independent bugs discovered during deep testing of the project.

1. Broadcast exclude logic does not work (mailbox.py)

broadcast() compares inbox directory names (e.g. alice_worker) against logical agent names (e.g. worker) in the exclude set. Since these never match, the sender always receives their own broadcast. Fixed by resolving logical names to inbox names before comparison.

2. cleanup deletes ALL teams' plan files (manager.py)

TeamManager.cleanup() called plans_dir.glob("*.md") and deleted everything in the shared plans/ directory — wiping plans from every team, not just the one being cleaned up. Fixed by loading the team config first and only deleting plan files whose agent name prefix matches a team member.

3. launch does not pass skip_permissions to spawned agents (commands.py)

The spawn command correctly reads skip_permissions from config and passes it to backends. The launch command did not, so agents spawned via templates (e.g. clawteam launch hedge-fund) never got --dangerously-skip-permissions even when configured. Fixed by reading the config value and forwarding it.

4. config set / config get only accept 3 of 7 valid keys (commands.py)

Both commands hardcoded {"data_dir", "user", "default_team"} as valid keys, making transport, workspace, default_backend, and skip_permissions impossible to set via CLI (even though config show displays all 7). Fixed by dynamically reading valid keys from the Pydantic model, with proper boolean field handling.

5. File transport fetch() has a race condition (file.py)

When multiple processes call fetch(consume=True) concurrently, both can read_bytes() the same message file before either calls unlink(), resulting in duplicate message delivery. Fixed by atomically renaming the file to .consumed before reading — only one process can win the rename(), and losers skip the message.

Changes

File Lines What
clawteam/team/mailbox.py +12 -1 Resolve agent names to inbox names in broadcast exclude
clawteam/team/manager.py +19 -5 Scope plan cleanup to team members only
clawteam/cli/commands.py +25 -7 Add skip_permissions to launch; expand config set/get keys
clawteam/transport/file.py +26 -7 Atomic rename-before-read in fetch

Test plan

  • All 124 existing tests pass (pytest tests/ -v)
  • ruff check — no new lint errors (pre-existing N802 on do_GET is unchanged)
  • Manually verified broadcast no longer delivers to sender
  • Manually verified config set transport file now works
  • Manually verified config set skip_permissions true handles booleans correctly

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40241bfcf3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread clawteam/team/manager.py Outdated
Comment on lines +195 to +201
member_names = {m.name for m in config.members}
for f in plans_dir.glob("*.md"):
try:
f.unlink()
except OSError:
pass
# Plan files are named {agent_name}-{plan_id}.md
# Extract agent name (everything before the last dash + 12-char hex id)
stem = f.stem
parts = stem.rsplit("-", 1)
if len(parts) == 2 and parts[0] in member_names:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep plan cleanup isolated to the current team

Because PlanManager.submit_plan() stores plans as {agent_name}-{plan_id}.md with no team prefix (clawteam/team/plan.py), this still deletes other teams' plans whenever they reuse the same member names. For example, two hedge-fund teams both produce risk-manager-*.md; running team cleanup on one team will unlink the other team's risk-manager plans as well, so the shared plans/ directory is still not actually scoped per team.

Useful? React with 👍 / 👎.

Comment on lines +52 to 55
consumed = f.with_suffix(".consumed")
try:
f.unlink()
f.rename(consumed)
except OSError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not acknowledge file messages before they are returned

Renaming the inbox file to .consumed here makes the message disappear from every later fetch()/count() call before it has actually been handed back to the caller. If the worker is interrupted after rename() succeeds but before messages.append(raw) completes, the stranded .consumed file is never retried because the transport only scans msg-*.json, so that message is lost permanently instead of being redelivered.

Useful? React with 👍 / 👎.

Longado pushed a commit to Longado/ClawTeam that referenced this pull request Mar 18, 2026
Combines the best of both open PRs, resolving their overlapping approaches
and applying all 6 fixes that are not yet in main:

**mailbox.py**
- broadcast(): resolve logical agent names to inbox directory names before
  exclusion check, so the sender is not delivered their own broadcast when
  inboxes use user-prefixed format (e.g. "alice_worker" vs "worker")
  [from PR HKUDS#15]
- receive() / peek(): replace bare list comprehension with _parse_messages()
  helper that wraps each message in try/except, so a single corrupted file
  no longer crashes the entire fetch [from PR HKUDS#9]

**transport/file.py**
- fetch(consume=True): atomically rename msg file to .consumed before
  reading, so concurrent callers cannot both receive the same message
  [from PR HKUDS#15]

**cli/commands.py**
- config set / config get: derive valid_keys from ClawTeamConfig.model_fields
  instead of a hardcoded 3-item set, making all 7 config keys accessible;
  handle boolean fields (skip_permissions) correctly [from PR HKUDS#15]
- approve-join: return an error and exit(1) when no matching join request
  is found, instead of silently fabricating an agent name [from PR HKUDS#9]
- launch_team: read skip_permissions from config and pass it to be.spawn(),
  matching the behaviour of the standalone spawn command [from PR HKUDS#15]

Note: the plans-cleanup fix (PR HKUDS#9 subdirectory approach, PR HKUDS#15 member-filter
approach) was already merged via PR HKUDS#6 using the subdirectory strategy, so it
is intentionally excluded here.

All 135 existing tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…issions, config set, fetch race

1. **broadcast exclude logic** (mailbox.py): Sender was never excluded from
   broadcast because `list_recipients()` returns inbox directory names
   (e.g. "alice_worker") while the exclude set contains logical agent names
   (e.g. "worker"). Now resolves names to inbox names before comparison.

2. **plans cleanup deletes all teams** (manager.py): `cleanup()` deleted ALL
   `.md` files in the shared plans directory instead of only files belonging
   to the team being cleaned up. Now loads team config first and only deletes
   plan files matching the team's member names.

3. **launch missing skip_permissions** (commands.py): Agents spawned via
   `clawteam launch` templates never received `--dangerously-skip-permissions`
   even when config had `skip_permissions: true`. Now reads config and passes
   the flag to `be.spawn()`.

4. **config set/get only accepts 3 of 7 keys** (commands.py): `config set`
   and `config get` hardcoded only `data_dir`, `user`, `default_team` as
   valid keys, making it impossible to configure `transport`, `workspace`,
   `default_backend`, or `skip_permissions` via CLI. Now dynamically reads
   valid keys from the Pydantic model, with proper boolean handling.

5. **file transport fetch race condition** (file.py): Concurrent `fetch()`
   calls could read the same message file before either process deleted it,
   causing duplicate message delivery. Now atomically renames the file to
   `.consumed` before reading, so only one process can claim each message.

All 124 existing tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@win4r win4r force-pushed the fix/multiple-bugs branch from 40241bf to b226347 Compare March 18, 2026 14:36
@tjb-tech
Copy link
Copy Markdown
Collaborator

Summary

Fixes 5 independent bugs discovered during deep testing of the project.

1. Broadcast exclude logic does not work (mailbox.py)

broadcast() compares inbox directory names (e.g. alice_worker) against logical agent names (e.g. worker) in the exclude set. Since these never match, the sender always receives their own broadcast. Fixed by resolving logical names to inbox names before comparison.

2. cleanup deletes ALL teams' plan files (manager.py)

TeamManager.cleanup() called plans_dir.glob("*.md") and deleted everything in the shared plans/ directory — wiping plans from every team, not just the one being cleaned up. Fixed by loading the team config first and only deleting plan files whose agent name prefix matches a team member.

3. launch does not pass skip_permissions to spawned agents (commands.py)

The spawn command correctly reads skip_permissions from config and passes it to backends. The launch command did not, so agents spawned via templates (e.g. clawteam launch hedge-fund) never got --dangerously-skip-permissions even when configured. Fixed by reading the config value and forwarding it.

4. config set / config get only accept 3 of 7 valid keys (commands.py)

Both commands hardcoded {"data_dir", "user", "default_team"} as valid keys, making transport, workspace, default_backend, and skip_permissions impossible to set via CLI (even though config show displays all 7). Fixed by dynamically reading valid keys from the Pydantic model, with proper boolean field handling.

5. File transport fetch() has a race condition (file.py)

When multiple processes call fetch(consume=True) concurrently, both can read_bytes() the same message file before either calls unlink(), resulting in duplicate message delivery. Fixed by atomically renaming the file to .consumed before reading — only one process can win the rename(), and losers skip the message.

Changes

File Lines What
clawteam/team/mailbox.py +12 -1 Resolve agent names to inbox names in broadcast exclude
clawteam/team/manager.py +19 -5 Scope plan cleanup to team members only
clawteam/cli/commands.py +25 -7 Add skip_permissions to launch; expand config set/get keys
clawteam/transport/file.py +26 -7 Atomic rename-before-read in fetch

Test plan

  • All 124 existing tests pass (pytest tests/ -v)
  • ruff check — no new lint errors (pre-existing N802 on do_GET is unchanged)
  • Manually verified broadcast no longer delivers to sender
  • Manually verified config set transport file now works
  • Manually verified config set skip_permissions true handles booleans correctly

🤖 Generated with Claude Code

Thanks for the fixes here.

I’m closing this PR because the useful parts have now been absorbed into main through later, current-main-compatible changes.

In particular, the following issues covered here have already been fixed upstream:

  • launch passing skip_permissions
  • config set/get supporting the full config surface
  • broadcast exclude working correctly with namespaced inboxes
  • file transport reducing duplicate-consumption risk when consume=True

So this PR is now effectively superseded by later changes, rather than something I want to merge directly.

Thanks again — the diagnosis and patch direction were useful.

@tjb-tech tjb-tech closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants