-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Refactor scheduler state mod #1913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| pub(crate) codec: BallistaCodec<T, U>, | ||
|
|
||
| // for in-memory cache | ||
| executors_metadata: Arc<RwLock<HashMap<String, ExecutorMetadata>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should executor metadata be persistent? Seems like it could be useful to speed up a scheduler restart since it would have to wait for executors to re-register but it seems like in the case of push-based scheduling we need to wait for heartbeats anyway and in the poll-based scheduling we wouldn't schedule work until the executor registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thinkharderdev for reviewing this PR. Actually the executor metadata is persistent to some backend storage. However, for fast reading, they are cached in memory. For all of those states in PersistentSchedulerState, firstly they will be persistent. Then they will be cached in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that part makes sense. I just am wondering whether it makes sense to, on a scheduler restart, to reload the executor state from persistent storage rather than let the executors re-register. In the latter case, executor metadata could be volatile state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose there's a cluster deployed on k8s in a standalone mode and we want to redeploy the scheduler. It would be better for the scheduler to reload the topology info from persistent storage. Whether the executor metadata be overdue or not, it should be managed by the executor heartbeat.
|
Let's try and get #1909 merged as soon as possible so you don't have to manage a large chain of PRs The work going into Ballista these days is pretty exciting! |
|
Hi @alamb and @thinkharderdev, could you help review this PR? |
|
Sorry, meant to approve earlier. I'm good with this PR. |
|
This seems like it is moving code around and thus largely unobjectionable. Merging it to keep things moving. Thanks @yahoNanJing and @thinkharderdev |
Which issue does this PR close?
Closes #1910.
Rationale for this change
Details see #1910.
What changes are included in this PR?
Details see #1910.
Are there any user-facing changes?
It's blocked by #1908 and #1909