Skip to content

support removal of event handlers from SharedIndexInformers#98657

Closed
mandelsoft wants to merge 6 commits intokubernetes:masterfrom
mandelsoft:informer
Closed

support removal of event handlers from SharedIndexInformers#98657
mandelsoft wants to merge 6 commits intokubernetes:masterfrom
mandelsoft:informer

Conversation

@mandelsoft
Copy link
Copy Markdown
Contributor

@mandelsoft mandelsoft commented Feb 1, 2021

What type of PR is this?
/kind feature

What this PR does / why we need it:
To be able to implement controllers that are dynamically deciding
on which resources to watch, it is required to get rid of
dedicated watches and event handlers again. This requires the
possibility to remove event handlers from SharedIndexInformers again.
Stopping an informer is not sufficient, because there might
be multiple controllers in a controller manager that independently
decide which resources to watch.

Unfortunately the ResourceEventHandler interface encourages to use
value objects for handlers (like the ResourceEventHandlerFuncs
struct, that uses value receivers to implement the interface).
Go does not support comparison of function pointers and therefore
the comparison of such structs is not possible, also. To be able
to remove all kinds of handlers and to solve the problem of
multi-registrations of handlers a registration handle is introduced.
It is returned when adding a handler and can later be used to remove
the registration again. This handle directly stores the created
listener to simplify the deletion.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Remark: If as the result of a handler removal a complete informer
should be disabled it is higly recommended to incorporate
pull request #98653, which fixes a race condition when stopping
watches for an informer using the stop channel.

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 1, 2021
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @mandelsoft. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 1, 2021
@fedebongio
Copy link
Copy Markdown
Contributor

/assign @kevindelgado
/triage accepted

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@fedebongio: GitHub didn't allow me to assign the following users: kevindelgado.

Note that only kubernetes members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

Details

In response to this:

/assign @kevindelgado
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 2, 2021
@kevindelgado
Copy link
Copy Markdown
Contributor

/assign @DirectXMan12

Copy link
Copy Markdown
Contributor

@kevindelgado kevindelgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Uwe, great minds think alike (or fools seldom differ)! I submitted an almost identical PR a bit before yours over here at #98536. We can go ahead with yours though since I think it might be a bit more complete than mine, but I’ve left a few comments/questions on some of the parts where we differ.

I was adding this change as part of some other work I had been doing related to giving finer grained control over the lifecycle of SharedIndexInformers at #97214 I’d love for you to take a look or give it a review that if you have time and are interested.

I’m tagging Yu Liao from sig-api-machinery on this because he has been helping review my other work in this area (and I’m not an org member so I don’t have approval access).

Copy link
Copy Markdown
Contributor

@kevindelgado kevindelgado Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the advantages are of extending the SharedIndexInformer Interface (rather than just adding to the SharedIndexInformer interface directly. I can't think of a reason why an informer would not want to expose this and it seems a bit unnecessary to manage more than one interface.

Copy link
Copy Markdown
Contributor Author

@mandelsoft mandelsoft Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know whether the informer interface is used by other implementations, also. In such a case extending the interface would break all those other implementations.

This was the reason I decided not to extend the existing interface, but introduce an extended one.

If you don't see a compatibility problem here, I surely can extend the existing interface, instead,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know in kubernetes/kubernetes there are no other implementations, not sure about outside of k/k. I'll leave it @yliaog or someone from sig-api-machinery to decide.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a breaking change if extending the SharedIndexInformer Interface directly. For breaking change, a few options: 1) send an email to kubernetes-sig-api-machinery@google.com, or kubernetes-dev@google.com 2) discuss it in biweekly sig-api-machinery meeting. 3) add to release notes about the breaking change

I think it is cleaner to extend the SharedIndexInformer directly, but it's better to check the community to ensure it won't break people unintentionally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sent I mail to kubernetes-sig-api-machinery@google.com referring to this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put it on the agenda; I can see either breaking compat or adding a "EventHandlerRemovable" as above, but I can't see making this "ExtendedSharedIndexInformer" interface; what will we do when we add another method, make a "ExtendedExtendedSharedIndexInformer"?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what your reasoning and specific use-case is for exposing IsStopped and IsStarted publicly. I hadn't needed or thought to do so.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives you more control managing informers besides factories, once you have bare informers in your hands. and you want to explicitly create/delete watches and/or add/remove handlers again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have specific use case in mind for EventHandlerCount, IsStopped, IsStarted? add/remove handlers are already supported via AddEventHandler/RemoveEventHandler, could you elaborate on the need for create/delete watches?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creation and deletion of watches is not part of the PR, it could completely be done on top. The scenario for the usage of such a feature are controllers that are controlled by other resources. Just an example, you want to provide indices of arbitrary kubernetes resources, based on a dedicated CRD. Once an index is created for a resource, a watch has to be established (or if already existent, only an additional handler might be configured). If the index is deleted again, the handler and, if the watch is not used by other controllers in the same controller manager, the watch should be removed again. Similar actions would be required, if a controller is potentially responsible for multiple clusters, depending on certain conditions provided by other resources.

The deletion of watches is possible just by a dedicated cancel context (on top) used to initiate the watch and the informer. This works fine (beside the problem described in #98653) and is minimal invasive. So, all this is potentially possible just by adding the feature of being able to remove handlers again.

If a traditional InformerFactory is used this is for sure not possible without extending the factory, too. But this PR is the precondition to propagate such a feature up the abstraction levels and focuses on the informer itself.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanations. i'm not sure i fully understand the use case, two quick questions:

  1. do you still need a separate watch if informer is used? put it another way, wouldn't an informer for a resource sufficient, hence no need to use both an informer and a watch on the same resource?
  2. since this PR is about removing event handlers, probably it's better to have a separate PR for supporting watches creation/deletion if needed. what do you think?

@kevindelgado
Copy link
Copy Markdown
Contributor

/unassign @DirectXMan12
/assign @yliaog

@k8s-ci-robot k8s-ci-robot assigned yliaog and unassigned DirectXMan12 Feb 2, 2021
@kevindelgado
Copy link
Copy Markdown
Contributor

lgtm % a final decision on extending/addding to the shared informer interface. Leaving that up to a reviewer sig-api-machinery.

@yliaog
Copy link
Copy Markdown
Contributor

yliaog commented Feb 3, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 3, 2021
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have specific use case in mind for EventHandlerCount, IsStopped, IsStarted? add/remove handlers are already supported via AddEventHandler/RemoveEventHandler, could you elaborate on the need for create/delete watches?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a breaking change if extending the SharedIndexInformer Interface directly. For breaking change, a few options: 1) send an email to kubernetes-sig-api-machinery@google.com, or kubernetes-dev@google.com 2) discuss it in biweekly sig-api-machinery meeting. 3) add to release notes about the breaking change

I think it is cleaner to extend the SharedIndexInformer directly, but it's better to check the community to ensure it won't break people unintentionally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanations. i'm not sure i fully understand the use case, two quick questions:

  1. do you still need a separate watch if informer is used? put it another way, wouldn't an informer for a resource sufficient, hence no need to use both an informer and a watch on the same resource?
  2. since this PR is about removing event handlers, probably it's better to have a separate PR for supporting watches creation/deletion if needed. what do you think?

@mandelsoft
Copy link
Copy Markdown
Contributor Author

@yliaog With the actual state for the pure watch removal no further changes are required. The additional Is... methods just completes the SharedIndexInformer interface concerning the handler registration allowing to work with bare informers.
For sure, there is no additional watch required besides the one behind the informer. But during dynamic handler registration/deregistration it is required to know whether the informer has still other handlers and whether it has already been started to decide whether it needs to be stopped to ensure the watch is canceled. This way these operations just complete the lifecycle api of a shared informer.

The optional extension of the upper API layer, the informer factories, to support cancelling of dedicated informers should certainly be done separately. But the factory so far does not know about handler registrations, which are directly performed on the bare informers. Therefore it might be difficult to completely hide the informer management in the factories, besides the feature of stopping an informer (and removing it from the factory again).
This one just provides a cohesive and clear interface for a shared informer enabling building things on top.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 22, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 21, 2021
Copy link
Copy Markdown
Contributor

@lavalamp lavalamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. The interface seems OK now so I reviewed the implementation, too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it do any harm to remove it anyway?

In any case, this doesn't seem worth returning an error to the user over.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does, if this is removed the addCh of the listener is closed twice, during the stop and the remove. This could be fixed by restructuring the existing code. But I would prefer to delegate this to a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this, we don't set these to nil anymore so this won't optimize anything, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just to avoid nil pointer dereferencing in case of some strange problem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to switch the lists to maps/sets? e.g. make the type map[*processorListener]struct{}. Then this check and the removal are easy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the original coding, I just kept this. May be this could be refactored in a separate PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should only need to do this second search if found is true, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just kept to handle this in a more defensive way. This should really not be performance critical.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can break here unless the listener can get in the loop twice? (another reason to switch to maps)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break added and i-- removed. Basically the former coding was more defensive.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would just wait in a loop until the condition is true, sleep 50ms between each check. Waiting a second and then failing might be flaky.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generalized the ok methods with for a polled check with timeout to use this here, also.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated above, I don't think this should be an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error case removed. But I left the error return to be prepared for further checks. For example whether the handle is valid for the actual informer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test: add and remove handlers while an informer is running and processing items, to detect any locking problems.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests added for nested add and remove of handlers. May be this is what you wanted.

To be able to implement controllers that are dynamically deciding
on which resources to watch, it is required to get rid of
dedicated watches and event handlers again. This requires the
possibility to remove event handlers from SharedIndexInformers again.
Stopping an informer is not sufficient, because there might
be multiple controllers in a controller manager that independently
decide which resources to watch.

Unfortunately the ResourceEventHandler interface encourages to use
value objects for handlers (like the ResourceEventHandlerFuncs
struct, that uses value receivers to implement the interface).
Go does not support comparison of function pointers and therefore
the comparison of such structs is not possible, also. To be able
to remove all kinds of handlers and to solve the problem of
multi-registrations of handlers a registration handle is introduced.
It is returned when adding a handler and can later be used to remove
the registration again. This handle directly stores the created
listener to simplify the deletion.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2021
@atoato88
Copy link
Copy Markdown

Gentle ping @mandelsoft
How about current state for this PR?

@mandelsoft
Copy link
Copy Markdown
Contributor Author

mandelsoft commented Nov 10, 2021

@atoato88 I added requested changes and rebases in October and added some comments to conversations to ask whether this is ok. But I didn't get answers up to now. Should I resolve those conversations? From my point of view I'm done, I just add requested changes to bring this PR to an end.

@dims
Copy link
Copy Markdown
Member

dims commented Jan 4, 2022

/assign @lavalamp

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2022
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@mandelsoft: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@atoato88
Copy link
Copy Markdown

atoato88 commented Feb 2, 2022

@mandelsoft
This PR needs rebase.
Could you do that?

@k8s-triage-robot
Copy link
Copy Markdown

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2022
@atoato88
Copy link
Copy Markdown

atoato88 commented May 5, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 5, 2022
@lavalamp
Copy link
Copy Markdown
Contributor

lavalamp commented Jun 8, 2022

/reopen

I think I was OOO when this last had activity and didn't see it

@FillZpp
Copy link
Copy Markdown

FillZpp commented Jun 9, 2022

👀 on this.

We have an issue kubernetes-sigs/controller-runtime#1884 that proposes to remove watches dynamically, but controller-runtime couldn't do this unless client-go supports removal of event handlers.

@alexzielenski
Copy link
Copy Markdown
Member

@mandelsoft Do you have plans to continue contributing to this PR? If not I am happy to continue this work in another PR.

/assign alexzielenski

@alexzielenski
Copy link
Copy Markdown
Member

other pr is merged so closing this

/close

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@alexzielenski: Closed this PR.

Details

In response to this:

other pr is merged so closing this

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

None yet

Development

Successfully merging this pull request may close these issues.