Skip to content

Handle the new session after session expiry#770

Merged
vmaheshw merged 12 commits intolinkedin:masterfrom
vmaheshw:reinitNewSession
Aug 3, 2021
Merged

Handle the new session after session expiry#770
vmaheshw merged 12 commits intolinkedin:masterfrom
vmaheshw:reinitNewSession

Conversation

@vmaheshw
Copy link
Copy Markdown
Collaborator

This is the final change to handle new session after session expiry. In this change, we have re-initialized all the local states, listeners, event threads and made the node re-join the cluster.

DEEPTHIKORAT
DEEPTHIKORAT previously approved these changes Nov 5, 2020
somandal
somandal previously approved these changes Nov 7, 2020
Copy link
Copy Markdown
Contributor

@somandal somandal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?

The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"

Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.

@vmaheshw vmaheshw dismissed stale reviews from somandal and DEEPTHIKORAT via 3a8030f January 25, 2021 19:50
@vmaheshw vmaheshw requested review from DEEPTHIKORAT and somandal May 7, 2021 18:51
@somandal
Copy link
Copy Markdown
Contributor

I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?

The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"

Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.

@vmaheshw can you please look at this comment and leave a response about whether this is a concern or not. If it is, please address it. If it is not, please explain why not. I just want to ensure that there is no weird race conditions that we need to think about here even though from what I understand this shouldn't be a problem.

@vmaheshw
Copy link
Copy Markdown
Collaborator Author

vmaheshw commented Jun 7, 2021

I recall we had a discussion earlier where we said that when a node expires, and then reconnects, it may need to update the locks it used to hold (if it still has the same tasks) to indicate that it's the new live instance, right? Should this fix be addressed here?
The PR where we discussed this: #747
Look for this comment:
"Quick question, if a Session expiry happens, the _instanceName remains the same? Just wondering if we could have a case where we're trying to release the lock but an expiry + connect happened before we call this, creating a new liveinstance node for this host. Will the task still be releasable?"
Not sure if such a fix is required, but it'll be great if you can validate and explain why or why not.

@vmaheshw can you please look at this comment and leave a response about whether this is a concern or not. If it is, please address it. If it is not, please explain why not. I just want to ensure that there is no weird race conditions that we need to think about here even though from what I understand this shouldn't be a problem.

Sorry, I forgot to reply. Yes, if the expiry+connect happened, before trying to release the lock, it will get the error "Not the owner" (As the owner was previous instance), then the task will move to some other instance and that instance while trying to acquire lock, will find this as orphan lock and force acquire it.

@vmaheshw vmaheshw closed this Jun 7, 2021
@vmaheshw vmaheshw reopened this Jun 7, 2021
@vmaheshw vmaheshw merged commit 457ac60 into linkedin:master Aug 3, 2021
vmaheshw added a commit to vmaheshw/brooklin that referenced this pull request Mar 1, 2022
This is the final change to handle new session after session expiry. In this change, we have re-initialized all the local states, listeners, event threads and made the node re-join the cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants