Skip to content

Bug: initc panic on pod update after termination #548

@steved

Description

@steved

What happened?

grove-initc panic: send on closed channel [recovered, repanicked]
grove-initc
grove-initc goroutine 201 [running]:
grove-initc k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x17d8688, 0xc0006f18f0}, {0x137e620, 0x17baf30}, {0x0, 0x0, 0x2?})
grove-initc 	/go/pkg/mod/k8s.io/apimachinery@v0.34.2/pkg/util/runtime/runtime.go:114 +0x1a9
grove-initc k8s.io/apimachinery/pkg/util/runtime.HandleCrashWithLogger({{0x17db960?, 0xc0006ab380?}, 0xc000935c38?}, {0x0, 0x0, 0x0})
grove-initc 	/go/pkg/mod/k8s.io/apimachinery@v0.34.2/pkg/util/runtime/runtime.go:91 +0x115
grove-initc panic({0x137e620?, 0x17baf30?})
grove-initc 	/usr/local/go/src/runtime/panic.go:783 +0x132
grove-initc github.com/ai-dynamo/grove/operator/initc/internal.(*ParentPodCliqueDependencies).registerEventHandler.func1({0x15b5f20?, 0xc0008e1208?})
grove-initc 	/go/src/github.com/ai-dynamo/grove/operator/initc/internal/wait.go:203 +0x186
grove-initc k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
grove-initc 	/go/pkg/mod/k8s.io/client-go@v0.34.2/tools/cache/controller.go:259
grove-initc k8s.io/client-go/tools/cache.(*processorListener).run.func1(0xc000545f5f, 0xc0006f6820, {0x142b900?, 0xc0003ba468?})
grove-initc 	/go/pkg/mod/k8s.io/client-go@v0.34.2/tools/cache/shared_informer.go:1076 +0xd1
grove-initc k8s.io/client-go/tools/cache.(*processorListener).run(0xc0006f6820)
grove-initc 	/go/pkg/mod/k8s.io/client-go@v0.34.2/tools/cache/shared_informer.go:1086 +0x3c
grove-initc k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
grove-initc 	/go/pkg/mod/k8s.io/apimachinery@v0.34.2/pkg/util/wait/wait.go:72 +0x4c
grove-initc created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 292
grove-initc 	/go/pkg/mod/k8s.io/apimachinery@v0.34.2/pkg/util/wait/wait.go:70 +0x73

If WaitForReady returns and allReadyCh is closed, but, before the process exits, a shared informer event happens that indicates a podclique is ready, it'll attempt to send on a closed channel.

What did you expect to happen?

No response

Environment

  • Kubernetes version
  • Grove version
  • Scheduler details
  • Cloud provider or hardware configuration
  • Tools that you are using Grove together with
  • Anything else that is relevant

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions