Skip to content

fix: address post-merge bot feedback across PRs #157–#162 #163

@Aureliolo

Description

@Aureliolo

Post-Merge Bot Feedback — PRs #157#162

Five of the last six merged PRs (#157, #158, #160, #161, #162) received bot review comments after the merge. This issue tracks all unaddressed findings for a single comprehensive fix.


PR #157 — Communication Foundation

1. unsubscribe() leaves blocked receive() callers permanently stuck

  • Source: Greptile | File: communication/bus_memory.py:403
  • unsubscribe() pops the subscriber's queue, but any coroutine already blocked in _await_with_shutdown holds a local reference and never gets a wakeup signal. Fix: put a sentinel None into the queue before removing it.

2. COMM_RECEIVE_TIMEOUT fires on graceful shutdown, not only timeout

  • Source: Greptile + Copilot | File: communication/bus_memory.py:450
  • result is None is returned for both timeout and shutdown, but both paths log COMM_RECEIVE_TIMEOUT. Monitoring can't distinguish slow consumers from orderly shutdown. Fix: return a sentinel/enum distinguishing TIMEOUT vs SHUTDOWN.

3. AgentMessenger facade is missing a receive() method

  • Source: Greptile | File: communication/messenger.py:96
  • The class auto-fills agent_id for send/subscribe/unsubscribe but has no receive() wrapper — consumers must bypass the facade and call bus.receive() directly with explicit agent_id.

4. MessageHandler isinstance check doesn't verify async

  • Source: Copilot | File: communication/dispatcher.py:99
  • @runtime_checkable only checks for a handle attribute, not that it's async. A synchronous handle() will pass registration but await handler.handle(...) will raise TypeError at runtime.

5. Cancelled tasks not awaited after .cancel() in _await_with_shutdown

  • Source: Copilot | File: communication/bus_memory.py:485
  • After cancelling get_task/shutdown_task, they aren't awaited, which can leave pending tasks and emit "Task was destroyed but it is pending" warnings. Fix: await with suppress(CancelledError).

PR #158 — Personality & Department Hardening

6. Missing log-before-raise in 3 validators

  • Source: Greptile | Files: core/company.py:42, company.py:303–313, templates/schema.py:119–127
  • _validate_not_self_report, _validate_unique_subordinates, and _validate_personality_mutual_exclusion raise without logging first, violating CLAUDE.md convention.

PR #160 — Delegation & Loop Prevention

7. get_state() / check() allocate unbounded _PairState entries

  • Source: Copilot | File: communication/loop_prevention/circuit_breaker.py:86
  • _get_pair() uses setdefault, creating entries for every pair ever checked. In long-running systems, _pairs grows unbounded.

8. DelegationDeduplicator never prunes stale entries

  • Source: Copilot | File: communication/loop_prevention/dedup.py:96
  • Expired entries only deleted when check() is called for the same key. One-off task titles accumulate indefinitely.

9. Dedup keys on task_title instead of stable identifier

  • Source: Copilot | File: communication/loop_prevention/guard.py:61
  • Titles aren't stable — tests already retitle tasks to bypass dedup. Should use task.id or a content hash instead.

10. delegate() identity/ID consistency not validated

  • Source: Copilot | File: communication/delegation/service.py:100
  • request.delegator_id/delegatee_id used for guard checks but authority validated against AgentIdentity objects. If these ever differ, callers can bypass dedup/rate-limit/circuit-breaker.

11. Rate limiter creates empty entries for unchecked pairs

  • Source: Copilot | File: communication/loop_prevention/rate_limit.py:70
  • check() always writes back self._timestamps[key] even when the pair has never been recorded, creating empty entries that grow unbounded.

PR #161 — Parallel Execution

12. except* Exception: pass silently swallows ExceptionGroup

  • Source: Copilot | File: engine/parallel.py:224
  • Unconditionally suppresses any ExceptionGroup from TaskGroup. Can mask unexpected bugs. Should only suppress when fail_fast is enabled and should log before suppressing.

13. Cancelled task logging gap

  • Source: Greptile | File: engine/parallel.py:290
  • CancelledError handler records outcome but emits no structured log event, unlike every other error path that emits PARALLEL_AGENT_ERROR.

14. progress.in_progress semantics wrong with semaphore

  • Source: Greptile | File: engine/parallel.py:256
  • in_progress incremented before semaphore acquisition. With max_concurrency=2 and 5 agents, all 5 show as in_progress immediately while only 2 are actually running.

15. Lock release failure logged as PARALLEL_VALIDATION_ERROR

  • Source: Copilot | File: engine/parallel.py:162
  • Resource lock release failures logged under PARALLEL_VALIDATION_ERROR — wrong event type makes observability harder.

16. plan_parsing.py duplicated in DESIGN_SPEC file tree

  • Source: Copilot + Greptile | File: DESIGN_SPEC.md:2379
  • plan_parsing.py listed twice with different descriptions. Remove the duplicate.

PR #162 — Template Inheritance

17. DESIGN_SPEC conventions table says (role, department) instead of (role, department, merge_id)

  • Source: Copilot | File: DESIGN_SPEC.md:2573
  • The conventions table entry omits merge_id from the merge key description, inconsistent with §14.1.

18. _expand_single_agent drops merge_id — CONFIRMED BUG

  • Source: Copilot + External reviewer
  • File: renderer.py:609–636
  • _expand_single_agent builds agent_dict with name, role, department, level, personality, model, and _remove — but never copies merge_id. Templates with multiple same-(role, department) agents cannot be individually targeted from a child template because _agent_key always computes merge_id="".

19. Unnecessary deepcopy in _remove path

  • Source: Copilot | File: templates/merge.py:175
  • When is_remove=True and a match is found, clean = copy.deepcopy(...) is computed but never used — the code immediately sets matched_entry.agent = None. Wasteful deepcopy.

Severity Summary

Severity # Items
Bug 2 #18 (merge_id dropped), #1 (unsubscribe hang)
Logic/Semantics 5 #2, #4, #9, #12, #14
Observability 3 #6, #13, #15
Unbounded growth 3 #7, #8, #11
Missing API 1 #3
Docs 2 #16, #17
Cleanup 2 #5, #19
Safety 1 #10

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:criticalBlocks other work, must do firstscope:large3+ days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:architectureDESIGN_SPEC Section 15 - Technical Architecturespec:communicationDESIGN_SPEC Section 5 - Communication Architecturespec:providersDESIGN_SPEC Section 9 - Model Provider Layerspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:fixBug fixes and correctionstype:infraCI/CD, tooling, project setuptype:refactorCode restructuring, cleanuptype:testTest coverage, test infrastructure

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions