Skip to content

[Bug]: Kanban parent-child handoff: scratch workspace GC destroys artifacts before child can read them #33774

@Julian-Bob

Description

@Julian-Bob

Summary

When a Kanban task A (with workspace_kind=scratch) completes and has a child task B linked via parents=[A], the child task B starts and finds that A's scratch workspace has already been garbage-collected — all files A wrote as handoff artifacts are gone.

The dependency chain (parents=[A]) should guarantee that B can read A's output, but the immediate _cleanup_workspace() on A's kanban_complete() destroys the workspace before the dispatcher promotes B to ready and spawns it.

Impact

  • Child workers crash with FileNotFoundError on the workspace directory or $HERMES_KANBAN_WORKSPACE points to a deleted path
  • Workers silently fall back to re-extracting data from source URLs, duplicating work
  • In multi-step pipelines, each step wastes tokens re-fetching/re-deriving parent output
  • Hard to diagnose: the log shows "cannot access '/path/to/workspace/': No such file or directory" with no explicit mention of GC

Root Cause

kanban_complete() in hermes_cli/kanban_db.py calls _cleanup_workspace() synchronously for scratch workspaces:

# complete_task(...)
_cleanup_workspace(conn, task_id)

The cleanup function removes the entire workspace directory immediately on completion:

if kind != "scratch" or not path:
    return

wp = Path(path)
if wp.is_dir():
    shutil.rmtree(wp, ignore_errors=True)

There is no check for "does this task have children that still need this workspace". The dependency engine handles task status promotion (parents done → child ready), but there is no workspace lifecycle that extends beyond the completing parent.

Steps to Reproduce

  1. Create a Kanban task A with workspace_kind=scratch (default)
  2. Worker A writes files into the workspace (e.g., summary.txt, guide.md)
  3. Worker A calls kanban_complete(summary=..., metadata={"artifacts": [...]})
  4. Create Kanban task B with parents=[A]
  5. Dispatcher promotes B to ready after A is done
  6. Worker B starts and tries to read $HERMES_KANBAN_WORKSPACE — it's gone

Evidence

Observed worker output when the parent scratch workspace is already GC'd:

The parent task's workspace has been cleaned up (it's a scratch workspace that gets GC'd). But the parent task's handoff tells me the artifacts were created. Since the files are gone (scratch GC'd), I need to re-extract from the source URL and get the content directly.

Proposed Fix — 3 Options

Option A — Deferred cleanup for linked parents:
_cleanup_workspace() should check via the task_links table whether the completing task has children still pending/ready/running. If so, defer the shutil.rmtree() until all children are also complete or archived. A background janitor can sweep orphaned scratch workspaces.

Option B — Inherited scratch workspace for linked tasks:
When a task is created with parents=[...], inherit the parent's scratch workspace path. Children read/write from the same physical directory. Cleanup happens only when the last linked task in the chain completes.

Option C — Document the constraint:
Update the kanban-worker SKILL.md to explicitly state that filesystem handoffs between parent/child scratch tasks are NOT supported. Artifacts must be passed via kanban_complete(metadata={...}) (inline data) or workers must use dir:<path> workspaces for persistent file sharing.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions