Skip to content

Make GC occasionally refresh its discovery information#46000

Closed
lavalamp wants to merge 3 commits intokubernetes:masterfrom
lavalamp:gc-poll-types
Closed

Make GC occasionally refresh its discovery information#46000
lavalamp wants to merge 3 commits intokubernetes:masterfrom
lavalamp:gc-poll-types

Conversation

@lavalamp
Copy link
Copy Markdown
Contributor

ref #44507 - this should unblock AA and TPR even though it's not a full fix.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 17, 2017
@k8s-github-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lavalamp

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 17, 2017
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had issues caching empty returns. If the list is empty, don't cache it. Same fo rresources.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes empty lists?

This is going to be updated every 30s so it should be ok if it forgets stuff for 30s

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What causes empty lists?

This is going to be updated every 30s so it should be ok if it forgets stuff for 30s

Server start seems to start listening for discovery before registering groups. It would be pretty bad if GC didn't work for a 30 second window for core types on startup. It ought not be a difficult check to add.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like the wrong fix, server could be in the middle of registering groups, too, and then we'd get some right and some wrong for 30s.

If an empty list is an error, then apiserver should probably not return 200s on empty lists?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks off. Why are you always fresh? Deserves a comment if its right.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I think I see. Setting this means that you're never being invalidated by higher logic. Definitely needs a comment explaining it and how its to be used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly the interface didn't document what this is supposed to do...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code that calls Fresh I still don't understand what it is supposed to do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Just for the cached return on error? I'd rather see that handled by outter logic that handles the error.

@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented May 18, 2017

I think that this needs at least one e2e test that creates a TPR and proves that GC collects it.

@lavalamp
Copy link
Copy Markdown
Contributor Author

thanks for feedback.

we've got to stop making unwieldy e2e tests. This should be testable as a unit or integration test in the GC, which I should add.

@ncdc
Copy link
Copy Markdown
Member

ncdc commented May 19, 2017

@lavalamp I don't think this is sufficient - the graph builder currently only sets up its monitors to list/watch resources when the GC controller is first instantiated, but it doesn't refresh right now.

@lavalamp
Copy link
Copy Markdown
Contributor Author

@ncdc good point, I will fix that in another PR.

@lavalamp lavalamp force-pushed the gc-poll-types branch 2 times, most recently from b67ad9b to 31800f9 Compare May 19, 2017 22:34
Copy link
Copy Markdown
Contributor

@caesarxuchao caesarxuchao May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a better definition of Fresh(). Could you update the comment in the interface as well?

The current comment in the interface describes a narrower case that's implemented by CachedDiscoveryClient, could you move the current comment in the interface to there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current comment makes no sense to me whatsoever, so ok.

@lavalamp
Copy link
Copy Markdown
Contributor Author

@mikedanese any idea why bazel can't find the new import?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp what does DiscoveryInterface provide ? Is this related to service discovery or api discovery ?

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 22, 2017
@lavalamp lavalamp force-pushed the gc-poll-types branch 2 times, most recently from 07e95d7 to 89134ac Compare May 22, 2017 21:08
@mikedanese
Copy link
Copy Markdown
Member

Looks like you figured it out.

@lavalamp
Copy link
Copy Markdown
Contributor Author

@mikedanese yes, I did, sorry for the noise. (git commit -a doesn't actually add new files, duh...)

defer d.lock.RUnlock()
cachedVal, ok := d.groupToServerResources[groupVersion]
if !ok {
return nil, ErrCacheEmpty
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks more like a NotFound than ErrCacheEmpty.

@k8s-github-robot
Copy link
Copy Markdown

@lavalamp PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2017
deletableResources := discovery.FilteredBy(discovery.SupportsAllVerbs{Verbs: []string{"delete"}}, preferredResources)
deletableGroupVersionResources, err := discovery.GroupVersionResources(deletableResources)
if err != nil {
return nil, fmt.Errorf("Failed to parse resources from server: %v", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lower-case

@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Jun 1, 2017

@lavalamp are you going to come back to this as a bugfix after freeze?

@lavalamp
Copy link
Copy Markdown
Contributor Author

lavalamp commented Jun 1, 2017

@deads2k I am not sure. My calendar is pretty booked over the next two weeks. :/ Let me see if Dawn will buy this is a bug fix.

@ash2k
Copy link
Copy Markdown
Member

ash2k commented Jun 2, 2017

Please consider moving the client to client-go repo so that it is more reusable. See kubernetes/client-go/issues/200

@ironcladlou
Copy link
Copy Markdown
Contributor

@lavalamp do you have any update on your plans for this PR? Is there anything I can do to help move it forward?

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@lavalamp: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
Jenkins Bazel Build 31800f978d61b3c09d714356d8a438cfe9ab7491 link @k8s-bot bazel test this
Jenkins verification 31800f978d61b3c09d714356d8a438cfe9ab7491 link @k8s-bot verify test this
Jenkins GCE etcd3 e2e 31800f978d61b3c09d714356d8a438cfe9ab7491 link @k8s-bot gce etcd3 e2e test this
pull-kubernetes-bazel 246241e link @k8s-bot pull-kubernetes-bazel test this
pull-kubernetes-kubemark-e2e-gce 246241e link @k8s-bot pull-kubernetes-kubemark-e2e-gce test this
pull-kubernetes-e2e-kops-aws 246241e link @k8s-bot pull-kubernetes-e2e-kops-aws test this
pull-kubernetes-unit 246241e link @k8s-bot pull-kubernetes-unit test this
pull-kubernetes-verify 246241e link @k8s-bot pull-kubernetes-verify test this
pull-kubernetes-federation-e2e-gce 246241e link @k8s-bot pull-kubernetes-federation-e2e-gce test this
pull-kubernetes-node-e2e 246241e link @k8s-bot pull-kubernetes-node-e2e test this
pull-kubernetes-e2e-gce-etcd3 246241e link @k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

}
}

type monitorSingle struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// monitorSingle wraps a cache.Controller and adds another stop channel (in addition to that of Run(stopCh)). It will stop
// if either of them is closed. 

}

func (gb *GraphBuilder) HasSynced() bool {
for _, monitor := range gb.monitors {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this misses a lock. Or at the very least it should be documented that it assumes it being taken.

func (gb *GraphBuilder) Run(stopCh <-chan struct{}) {
gb.monitorLock.Lock()
defer gb.monitorLock.Unlock()
gb.globalStop = stopCh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is fishy: NewGarbageCollector calls syncMonitors which depends on globalStop.

absentOwnerCache *UIDCache
}

type stopMux struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stopMux thing does not make the code easier to understand IMO. Just inline the mux logic into monitorSingle.Run.

It might even be much simpler to have one Go routine waiting on the global lock and shutting down all monitors in a loop. Any special reason that you want to have this logic per monitor?

}

func NewMetadataCodecFactory() serializer.CodecFactory {
type MetadataCodecFactory struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this thing doing?

@@ -0,0 +1,70 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017

k8s-github-robot pushed a commit that referenced this pull request Jul 5, 2017
Automatic merge from submit-queue (batch tested with PRs 47700, 48464, 48502)

Add a refreshing discovery client

Introduce a discovery client (implementing `CachedDiscoveryInterface`) which caches discovery information in memory and which can be actively refreshed by the user.

This implementation fetches from discovery upon refresh and could later be improved to maintain updates from a watch.

Extracted from #47665 and #46000 to help reduce the scope of #48065.

```release-note
NONE
```
k8s-github-robot pushed a commit that referenced this pull request Jul 29, 2017
Automatic merge from submit-queue (batch tested with PRs 49538, 49708, 47665, 49750, 49528)

Enable garbage collection of custom resources

Enhance the garbage collector to periodically refresh the resources it monitors (via discovery) to enable custom resource definition GC (addressing #44507 and reverting #47432).

This is a replacement for #46000.

/cc @lavalamp @deads2k @sttts @caesarxuchao 

/ref #48065

```release-note
The garbage collector now supports custom APIs added via CustomeResourceDefinition or aggregated apiservers. Note that the garbage collector controller refreshes periodically, so there is a latency between when the API is added and when the garbage collector starts to manage it.
```
@ironcladlou
Copy link
Copy Markdown
Contributor

This PR was superseded by #47665 and can be closed.

@enisoc enisoc closed this Jul 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.