Register privileges in Kibana Platform Security plugin and remove legacy getUser API.#65472
Conversation
|
Pinging @elastic/kibana-security (Team:Security) |
d4c6f7b to
e44e540
Compare
e44e540 to
44b39b3
Compare
44b39b3 to
baa580c
Compare
| fetchedSpaces => setSpaces({ enabled: true, list: fetchedSpaces }), | ||
| (err: IHttpFetchError) => { | ||
| // Spaces plugin can be disabled and hence this endpoint can be unavailable. | ||
| if (err.response?.status === 404) { |
There was a problem hiding this comment.
note: I moved this change to a separate commit as I want to hear your thoughts. It's "the least evil" solution I could come up with. Alternatively we can introduce an endpoint that will return isSpacesEnabled only based on the "Spaces Service" that Spaces registers with Security during setup.
There was a problem hiding this comment.
I think this is tolerable. I think it'd make sense (in a followup) for us to export a useSpaces hook from the Spaces plugin itself, so that we could at least have the Spaces plugin own this determination.
The disabled check aside, I could see other plugins benefiting from this hook too.
| validateFeaturePrivileges(allFeatures); | ||
| validateReservedPrivileges(allFeatures); | ||
|
|
||
| await registerPrivilegesWithCluster( |
There was a problem hiding this comment.
I think we have a gap in error handling here. The legacy implementation had retry logic (with backoff) in the event of a transient failure, which I think we want here as well. If await registerPrivilegesWithCluster() throws an error, but neither this.status.core$ nor this.license.features$ emit a new value, then Kibana will be stuck in an unusable state from an authorization perspective. We will either have to wait for one of those two observables to change (if that ever happens), or Kibana will have to be restarted in order to reattempt privilege registration.
There was a problem hiding this comment.
That's a good point, thanks! I naively expected status service to catch intermittent 503s and emit new status updates, but it seems not be the case since you're bringing this up 🙂 I'm not sure how I feel about having this logic (not simple, prone to errors) in every plugin though. Let me think a bit...
There was a problem hiding this comment.
That's a good point, thanks! I naively expected status service to catch intermittent 503s and emit new status updates
I think it will definitely catch some of these, but I could also imagine a race condition where core thinks everything is OK, but then we get network blip where we lose our connection for a very short period of time. core might not catch this blip if it wasn't doing anything with ES at the time.
It's also possible that the ES connection would be technically ok, but if the cluster is starved for resources, then it's possible that our specific call to register privileges would fail (disk space watermark, readonly indices, etc.)
I'm not sure how I feel about having this logic (not simple, prone to errors) in every plugin though. Let me think a bit...
Yeah, that's definitely a tradeoff. That's probably one of the reasons we originally had this as a reusable utility, so that each plugin wouldn't have to implement its own.
There was a problem hiding this comment.
It's also possible that the ES connection would be technically ok, but if the cluster is starved for resources, then it's possible that our specific call to register privileges would fail (disk space watermark, readonly indices, etc.)
The thing that concerns me most is that Core determines the first time when we try to perform an action, but if that action fails, then responsibility shifts to a consumer that now decides when and if it's appropriate to do a re-try. It seems we don't have a way to force Core to check whether ES is available or not at the time we do a re-try. If ES is struggling for whatever reason we (every plugin that needs to do a re-try) don't want to make things worse via bombarding it with the requests. Ideally Core should be a single source of truth that decides whether it's appropriate to do a re-try or not.
There was a problem hiding this comment.
That's a fair point. @elastic/kibana-platform do you have any thoughts around this?
To summarize, we are waiting for Core to tell us when ES is ready via the exposed CoreStatus service. Once all prerequisites are met, we attempt to make a couple of calls to ES. If these checks fail though, we would like to be able to retry them at some point in the future, but it's possible that everything appears "fine" from the perspective of CoreStatus, so we will never get an updated value from the Observable.
Having each plugin implement its own retry logic is feeling a bit complex and error prone. Aleh also makes a good point that plugins shouldn't be exacerbating the problem by retrying whenever they feel it is appropriate, but instead should wait on a signal from Core. The problem is that we don't have a way to alert Core that something went wrong.
There was a problem hiding this comment.
@legrego I don't remember we discussed this case. Added a comment to the status service meta-issue #41983 (comment)
cchaos
left a comment
There was a problem hiding this comment.
SASS change is fine. Thanks for adding the Sass linting!
| let retryTimeout: NodeJS.Timeout; | ||
|
|
||
| // Register cluster privileges once Elasticsearch is available and Security plugin is enabled. | ||
| this.statusSubscription = combineLatest([ |
There was a problem hiding this comment.
note: presumably we'll get rid of this code in 7.10, so I gave up on learning how to properly leverage retryWhen here and went with plain setTimeout approach 🙈
|
@legrego should be ready for another pass, thanks! |
|
Pinging @elastic/apm-ui (Team:apm) |
|
Pinging @elastic/uptime (Team:uptime) |
|
Pinging @elastic/ingest-management (Team:Ingest Management) |
There was a problem hiding this comment.
note I'm running a flaky test suite for the api_integration ciGroup to make sure this is stable. I worry that we might not be immediately notified of the license change, or that the privilege registration will take longer than the subsequent API request to get the next set of privileges
https://kibana-ci.elastic.co/job/kibana+flaky-test-suite-runner/512/
Once the flaky runner completes, this LGTM! Thanks for the edits!
jen-huang
left a comment
There was a problem hiding this comment.
Ingest manager README changes LGTM
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
Thanks! All 42 runs of flaky test suite passed, but I'll keep eye on this. In case that start failing I'll just convert final |
…acy `getUser` API. (elastic#65472) # Conflicts: # x-pack/README.md # x-pack/plugins/uptime/README.md # x-pack/scripts/functional_tests.js
|
7.x/7.9.0: 2890d5b |
Summary
In this PR we're:
xpack.spaces.enabled)Note to reviewers: it's a bit easier to review commit by commit (git move + changes).
If you received a review ping: main X-Pack API integration test config was migrated to TS and hence required fix in all references.
Blocked by: #65461