-
Notifications
You must be signed in to change notification settings - Fork 5k
Refactor WorkerServer #1658
Copy link
Copy link
Closed
Labels
featurenew featurenew feature
Description
Background
WorkerServer executes task by scanning ZK and DB. When WorkerServer starts, it try to retrive the lock in zk, and then executes task by loading data from DB. This is not nice for distributing system, and the current implementation will result in delay executing task.
Suggestion
We wanna use tcp channel to refactor WorkerServer.
General Implementation Idea
- Using Netty for our tcp framework.
- MasterServer keeps the current logic and when it picks a task, directly sends it to target worker using RoundRobin policy.
- WorkerServer will start up as predefined group and register itself to zk node.
- WorkerServer will start a tcp server listening port for executing task instead of scanning ZK and DB.
- Executing result will send back to the MasterServer node using the previous channel.
General Failover Idea
- For WorkerServer, only it receives the task command and gives back the ack command to keep the task is acknowledged.
- If the WorkerServer executes the task normally, it will send back the result by the previous channel.
- If the WorkerServer died after receiving a task, MasterServer will use execution-timeout time to ping WorkerServer to detect liveness. If ping failed, try another worker node. In this case, task may execute more than once.
- If the MasterServer died after sending out the a task, WorkerServer will retry to rebuild the channel with N times to the original MasterServer. If failed after retry times, choose a new MasterServer to send back the result. New MasterServer will analysis the task, decide the next process. (Stop or continue execute by instanceId/processId, or just update the status)
Reactions are currently unavailable
Metadata
Metadata
Labels
featurenew featurenew feature