Skip to content

Refactor WorkerServer #1658

@Technoboy-

Description

@Technoboy-

Background

WorkerServer executes task by scanning ZK and DB. When WorkerServer starts, it try to retrive the lock in zk, and then executes task by loading data from DB. This is not nice for distributing system, and the current implementation will result in delay executing task.

Suggestion

We wanna use tcp channel to refactor WorkerServer.

General Implementation Idea

  1. Using Netty for our tcp framework.
  2. MasterServer keeps the current logic and when it picks a task, directly sends it to target worker using RoundRobin policy.
  3. WorkerServer will start up as predefined group and register itself to zk node.
  4. WorkerServer will start a tcp server listening port for executing task instead of scanning ZK and DB.
  5. Executing result will send back to the MasterServer node using the previous channel.

General Failover Idea

  1. For WorkerServer, only it receives the task command and gives back the ack command to keep the task is acknowledged.
  2. If the WorkerServer executes the task normally, it will send back the result by the previous channel.
  3. If the WorkerServer died after receiving a task, MasterServer will use execution-timeout time to ping WorkerServer to detect liveness. If ping failed, try another worker node. In this case, task may execute more than once.
  4. If the MasterServer died after sending out the a task, WorkerServer will retry to rebuild the channel with N times to the original MasterServer. If failed after retry times, choose a new MasterServer to send back the result. New MasterServer will analysis the task, decide the next process. (Stop or continue execute by instanceId/processId, or just update the status)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions