Skip to content

Clean up licensing APIs#109121

Merged
DaveCTurner merged 2 commits intoelastic:mainfrom
DaveCTurner:2024/05/28/licensing-trappy-timeouts
May 29, 2024
Merged

Clean up licensing APIs#109121
DaveCTurner merged 2 commits intoelastic:mainfrom
DaveCTurner:2024/05/28/licensing-trappy-timeouts

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

Fixes trappy timeouts related to #107984 and also cleans up all the
indirection via unnecessary request builders.

Fixes trappy timeouts related to elastic#107984 and also cleans up all the
indirection via unnecessary request builders.
@DaveCTurner DaveCTurner added >non-issue :Security/License License functionality for commercial features >tech debt v8.15.0 labels May 28, 2024
@DaveCTurner DaveCTurner requested a review from jakelandis May 28, 2024 16:15
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine elasticsearchmachine added the Team:Security Meta label for security team label May 28, 2024
Copy link
Copy Markdown
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this. LGTM, but a couple follow up questions (I can follow up).

Not sure I understand the difference between the timeout applied on the MasterNodeRequest vs. the submitTask. Should we be passing in request.masterNodeTimeout() to the MasterServiceTaskQueue#submitTask in the following:

https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/license/ClusterStateLicenseService.java#L317
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/license/ClusterStateLicenseService.java#L348
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/license/ClusterStateLicenseService.java#L362

Also, I am correct to assume that the ackTimeout (w.r.t. cluster state updates) only applies for AckedClusterStateUpdateTask and AckedBatchedClusterStateUpdateTask tasks ? I think PostStartBasicRequest could be a MasterNodeRequest instead of a AcknowledgedRequest since it is the wrong kind of "acknowledged" since StartBasicClusterTask is not acked.

@DaveCTurner
Copy link
Copy Markdown
Member Author

Not sure I understand the difference between the timeout applied on the MasterNodeRequest vs. the submitTask.

The timeout on the MasterNodeRequest applies to routing the request from the coordinating node to the master (i.e. it's how long it waits to discover a new master if the old master fails) whereas the one on the submitTask call determines how long the task will sit in the elected master's pending task queue waiting for other tasks to finish processing before it gives up and fails. That said, in almost every other API we pass the ?master_timeout request parameter to both of these places, so the behaviour of these APIs is inconsistent.

Should we be passing in request.masterNodeTimeout() to the MasterServiceTaskQueue#submitTask in the following:

Yes. Probably. The trouble is that today once these tasks make it to the elected master they will today sit in the queue for as long as it takes to execute them. That's likely an oversight, but it's one which dates back a very long way. Changing this to align with the common ?master_timeout behaviour in most other APIs would drop this timeout to 30s and that might cause some visible breakage in cases where the master is busy doing other tasks.

Also, I am correct to assume that the ackTimeout (w.r.t. cluster state updates) only applies for AckedClusterStateUpdateTask and AckedBatchedClusterStateUpdateTask tasks ?

Technically it applies to all tasks that implement ClusterStateAckListener and to all task executors that supply an ack listener on success via ClusterStateTaskExecutor.TaskContext#success(Runnable, ClusterStateAckListener)), but mostly that's AckedClusterStateUpdateTask and AckedBatchedClusterStateUpdateTask indeed.

I think PostStartBasicRequest could be a MasterNodeRequest

Possibly, but possibly also the tasks which adjust the license should be made into acked tasks? It'd make sense to me to have these tasks wait for the updated license to be applied on all nodes, rather than just on the master as happens today.

@DaveCTurner DaveCTurner merged commit 0e03920 into elastic:main May 29, 2024
@DaveCTurner DaveCTurner deleted the 2024/05/28/licensing-trappy-timeouts branch May 29, 2024 07:14
@DaveCTurner DaveCTurner restored the 2024/05/28/licensing-trappy-timeouts branch June 17, 2024 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :Security/License License functionality for commercial features Team:Security Meta label for security team v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants