Skip to content

Conversation

@yahoNanJing
Copy link
Contributor

Which issue does this PR close?

Closes #1910.

Rationale for this change

Details see #1910.

What changes are included in this PR?

Details see #1910.

Are there any user-facing changes?

It's blocked by #1908 and #1909

pub(crate) codec: BallistaCodec<T, U>,

// for in-memory cache
executors_metadata: Arc<RwLock<HashMap<String, ExecutorMetadata>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should executor metadata be persistent? Seems like it could be useful to speed up a scheduler restart since it would have to wait for executors to re-register but it seems like in the case of push-based scheduling we need to wait for heartbeats anyway and in the poll-based scheduling we wouldn't schedule work until the executor registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thinkharderdev for reviewing this PR. Actually the executor metadata is persistent to some backend storage. However, for fast reading, they are cached in memory. For all of those states in PersistentSchedulerState, firstly they will be persistent. Then they will be cached in memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that part makes sense. I just am wondering whether it makes sense to, on a scheduler restart, to reload the executor state from persistent storage rather than let the executors re-register. In the latter case, executor metadata could be volatile state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose there's a cluster deployed on k8s in a standalone mode and we want to redeploy the scheduler. It would be better for the scheduler to reload the topology info from persistent storage. Whether the executor metadata be overdue or not, it should be managed by the executor heartbeat.

@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

Let's try and get #1909 merged as soon as possible so you don't have to manage a large chain of PRs

The work going into Ballista these days is pretty exciting!

@yahoNanJing
Copy link
Contributor Author

Hi @alamb and @thinkharderdev, could you help review this PR?

@thinkharderdev
Copy link
Contributor

Sorry, meant to approve earlier. I'm good with this PR.

@alamb
Copy link
Contributor

alamb commented Mar 10, 2022

This seems like it is moving code around and thus largely unobjectionable. Merging it to keep things moving.

Thanks @yahoNanJing and @thinkharderdev

cc @liukun4515 @realno @mingmwang

@alamb alamb merged commit 962c018 into apache:master Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ballista] Refactor scheduler state

3 participants