feat: introduce state for generator and new grafana SA generator#3982
feat: introduce state for generator and new grafana SA generator#3982
Conversation
da3b860 to
8cb33ba
Compare
|
Nice! Three questions:
|
It has no state, there's no state to store or anything to clean up.
At the moment it's just a flag for the controller, hence static across all PushSecrets and ExternalSecrets. That's just the simplest and straight-forward thing, we can even consider to make it part of the
Not sure yet if it is even possible. I think it's quite the pain to make it work with ExternalSecret. |
0a8b3c0 to
ed9a1a4
Compare
101d94c to
f1a6963
Compare
|
What happens if the Status of an object is changed? The status field is an elusive thing. Using that as a store is problematic at times. The status is regarded as something that we should be able to regenerate to the same value on each reconcile attempt. For example, having an endpoint in there that other controllers then can use to reach something. A SHA or something that doesn't change. A condition, etc... Thus, when it's missing, it can just be regenerated. Status is supposed to be a view from the Kubernete's point of the current system. In this case, the status will store state for something that comes from a different thing that if it's running again, will result in a new thing if it's missing and could mis existing things, can it not? |
|
So what I'm thinking about is the following scenario:
|
|
What I'm trying to say is that we are basically turning the status field into a key-value store. Basically graceperiod is flimsy since the status can disappear or can be regenerated and then all the stored config is lost. We are also increasing yaml serialization and size for each CR when we store the spec in the field. Wouldn't it be better if we would use annotations or a specific CR for this? What if, instead of serializing it into the status field, we create a separate CR saying generator tracking or something and store that information in the spec of that CR? If you think it's okay that the field might be lost and this is temporary data at best, I guess it's fine. :) Just keep these things in mind and don't rely on the field being there all the time. :) |
Thank you for your input! You're right with all your concerns! The problem is that we can not re-build the generator state, not all providers allow us to store ownership metadata on resources we generate, e.g. grafana service account tokens. For some providers it could work, e.g. AWS IAM Keys with tags or GCP Service Accounts with labels/annotations. I think it's ok if the status gets lost in some corner cases. Users don't typically fumble around with statuses (other than reading from it), so i don't expect that the status suddenly vanishes. I would add that behaviour to the docs: ESO may not be able to 🧹 clean up generated contents in the following cases:
I would categorize 1, 2, 3 and 5 as user errors. 4 is a risk we can not control. I think dedicated resources for generator status would help with some of the limitations, but i think this comes with a cost: more resources and complexity, more documentation and code, training for the user that they need to be aware of that etc. |
027f30c to
1ea5d1b
Compare
1ea5d1b to
36dd28b
Compare
|
I'm going to find some time to do this this weekend. 😊 |
|
Ugh, so sorry for the conflict. :/ |
|
Actually those came from the controller API thingy... |
|
I used: Where the token is one with basically apiVersion: generators.external-secrets.io/v1alpha1
kind: Grafana
metadata:
name: test
spec:
url: "moolen.grafana.net"
auth:
token:
name: "grafana-token"
key: "token"
serviceAccount:
name: "foo"
role: "Editor"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: stateful-generator
spec:
refreshInterval: 1m
dataFrom:
- sourceRef:
generatorRef:
apiVersion: generators.external-secrets.io/v1alpha1
kind: Grafana
name: testedit Once applied and reconciling you should be able to see how service accounts come and go (ever counting ids) |
|
Thanks! |
Skarlso
left a comment
There was a problem hiding this comment.
I had a deeper look and have a few questions. :)
| func (m *Manager) MoveStateToGC(resource *apiextensions.JSON, stateKey string, gen v1alpha1.Generator, state v1alpha1.GeneratorProviderState) { | ||
| m.internalState = append(m.internalState, QueueItem{ | ||
| Commit: func() error { | ||
| return m.ImmediateMoveStateToGC(resource, stateKey, gen, state) | ||
| }, | ||
| }) | ||
| } | ||
|
|
||
| // ImmediateMoveStateToGC will move the generator state to GC immediately. | ||
| func (m *Manager) ImmediateMoveStateToGC(resource *apiextensions.JSON, stateKey string, gen v1alpha1.Generator, state v1alpha1.GeneratorProviderState) error { |
There was a problem hiding this comment.
These two are a bit confusing API and usage wise. Like the first one sets Commit function the second one actually performs and Enqueue but at the look of the API I wouldn't have thought that.
Like, Move doesn't actually do anything until you call Commit. I would call this a setup or a set commit or something like that to make apparent that you'd have to do something actually afterwards.
WDYT?
There was a problem hiding this comment.
I agree, that's bad design/naming. Let me rethink that.
|
Meh, found a major issue after rebasing.. the status update mechanics have changed and status updates get dropped. I'll need to dig this 🐇 🕳️ |
|
@moolen I think most of the times in that PR the markAsFailed is not used. The reason behind that was that it would mess with the reconcile time I believe. Maybe that was an error?! |
36dd28b to
62dea0e
Compare
62dea0e to
a739884
Compare
Signed-off-by: Moritz Johner <beller.moritz@googlemail.com>
a739884 to
a372c4e
Compare
|
|
WRT #4086
Let me illustrate the important bits from ES's Reconcile() func: externalSecret := &esv1beta1.ExternalSecret{}
err = r.Get(ctx, req.NamespacedName, externalSecret)
// the PR removed this bit right here which seems superfluous, but see below
// for a specific case where this might cause issues.
err = r.SubResource("status").Get(ctx, externalSecret, externalSecret)
if err != nil {
log.Error(err, "failed to get status subresource")
return ctrl.Result{}, err
}
// ...things
defer func(){
// the status update goes straight to kube-apiserver, and does not update the cache.
// if we change the status AND requeue immediately (see below), then we requeue with the old status.
_ = r.Status().Update(ctx, externalSecret)
if updateErr == nil {
return
}
}()
// in some conditions we do requeue immediately. This should happen rarely, but it can
return ctrl.Result{Requeue: true}, nilBy (Kubernetes) design, a controller should be able to reconstruct the status. In our case we store stateful information With this PR introducing another stateful field to the status subresource, this bug becomes more severe, triggering ~30-50% of the time due to many changes in I think we shouldn't reintroduce the With that being said, i've made the decision to rewrite the state handling using a custom resource, as it seems to be less error prone and more closely follows the Kubernetes design principles. |
|
I'm gonna close this PR and open a new one with the new implementation ⌛ |



Stateful Generator Proposal
This PR is a propsal to introduce state for generators alongside a implementation for a new stateful generator for managing grafana service accounts.
📜 You can find the design proposal here