Skip to content

[Feature]: Harden Kanban task validity, dispatch preflight, and worker ownership #23209

@qWaitCrypto

Description

@qWaitCrypto

Problem or Use Case

Kanban has recently grown into a durable multi-process task queue, but some queue invariants are still only enforced indirectly through prompts, skill docs, or manual operator recovery.

This shows up as a cluster of related failure modes rather than one isolated bug:

  • tasks can be created with invalid skills values that are actually toolset names
  • ready tasks can remain dispatchable even when they are obviously undispatchable
  • stale / reclaimed / superseded workers can still begin work unless ownership is checked at startup
  • operators sometimes need diagnostics and narrow recovery commands for stuck task state

Related issues include:

The problem is not that Kanban needs a redesign. The problem is that a few important invariants are not yet enforced consistently across task creation, dispatch, and worker startup.

Proposed Solution

Do a narrow Kanban hardening pass across three layers:

  1. Task validity and diagnostics
  • reject known toolset names in task.skills at create time
  • surface historical bad rows through diagnostics
  • add minimal recovery support for invalid persisted skills
  1. Dispatch preflight for obviously undispatchable tasks
  • hard-skip ready tasks that are clearly invalid before spawn
  • initial hard-skip cases:
    • invalid persisted task skills
    • missing assignee profile
  • keep softer capability concerns advisory-only rather than blocking dispatch
  1. Worker startup ownership guard
  • verify task/run/claim ownership at worker startup before useful work begins
  • benign-exit stale / reclaimed / superseded workers
  • avoid counting those ownership/lifecycle exits as real worker failures

This should stay intentionally narrow:

  • no required_toolsets manifest
  • no default profile permission expansion
  • no full capability model
  • no broad dispatcher redesign

Alternatives Considered

A few broader approaches were considered, but deferred on purpose:

  • Full capability modeling for profiles and tasks
    • likely useful later, but too large for the current bug cluster
  • Expanding default profile toolsets
    • changes default permissions and invites a bigger policy debate
  • A broader dispatcher redesign
    • unnecessary for the current failure chain
  • Relying only on prompt/skill guidance
    • this is the current weak point; core ownership and validity checks should live in the system, not only in model
      behavior

The proposed approach is better because it closes the concrete failure chain with a small number of reviewable
changes.

Feature Type

Reliability / correctness

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Linked / stacked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions