-
Notifications
You must be signed in to change notification settings - Fork 5k
[Bug] [Master] zookeeper failover error #7534
Copy link
Copy link
Closed
Labels
Waiting for replyWaiting for replyWaiting for replybugSomething isn't workingSomething isn't working
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
[INFO] 2021-12-22 07:02:38.909 org.apache.zookeeper.ClientCnxn:[1158] - Unable to read additional data from server sessionid 0x200000aec62004e, likely server has closed socket, closing socket connection and attempting reconnect
[INFO] 2021-12-22 07:02:39.010 org.apache.curator.framework.state.ConnectionStateManager:[251] - State change: SUSPENDED
[WARN] 2021-12-22 07:02:39.011 org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperConnectionStateListener:[50] - Registry suspended
[WARN] 2021-12-22 07:02:39.011 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[417] - registry connection state is SUSPENDED, ready to stop myself
[INFO] 2021-12-22 07:02:39.011 org.apache.dolphinscheduler.server.master.MasterServer:[196] - master server is stopping ..., cause : registry connection state is SUSPENDED, stop myself
[INFO] 2021-12-22 07:02:39.681 org.apache.zookeeper.ClientCnxn:[1025] - Opening socket connection to server z-3.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.22.252:2181. Will not attempt to authenticate using SASL (unknown error)
[INFO] 2021-12-22 07:02:39.684 org.apache.zookeeper.ClientCnxn:[879] - Socket connection established to z-3.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.22.252:2181, initiating session
[INFO] 2021-12-22 07:02:39.686 org.apache.zookeeper.ClientCnxn:[1158] - Unable to read additional data from server sessionid 0x200000aec62004e, likely server has closed socket, closing socket connection and attempting reconnect
[INFO] 2021-12-22 07:02:39.811 org.apache.zookeeper.ClientCnxn:[1025] - Opening socket connection to server z-1.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.25.7:2181. Will not attempt to authenticate using SASL (unknown error)
[INFO] 2021-12-22 07:02:39.811 org.apache.zookeeper.ClientCnxn:[879] - Socket connection established to z-1.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.25.7:2181, initiating session
[INFO] 2021-12-22 07:02:39.812 org.apache.zookeeper.ClientCnxn:[1158] - Unable to read additional data from server sessionid 0x200000aec62004e, likely server has closed socket, closing socket connection and attempting reconnect
[INFO] 2021-12-22 07:02:41.740 org.apache.zookeeper.ClientCnxn:[1025] - Opening socket connection to server z-2.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.20.121:2181. Will not attempt to authenticate using SASL (unknown error)
[INFO] 2021-12-22 07:02:41.741 org.apache.zookeeper.ClientCnxn:[1162] - Socket error occurred: z-2.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.20.121:2181: Connection refused
[INFO] 2021-12-22 07:02:42.098 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
[INFO] 2021-12-22 07:02:42.098 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[160] - master schedule service stopped...
[INFO] 2021-12-22 07:02:42.104 org.apache.dolphinscheduler.remote.NettyRemotingServer:[243] - netty server closed
[INFO] 2021-12-22 07:02:42.492 org.apache.zookeeper.ClientCnxn:[1025] - Opening socket connection to server z-3.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.22.252:2181. Will not attempt to authenticate using SASL (unknown error)
[INFO] 2021-12-22 07:02:42.493 org.apache.zookeeper.ClientCnxn:[879] - Socket connection established to z-3.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.22.252:2181, initiating session
[INFO] 2021-12-22 07:02:42.494 org.apache.zookeeper.ClientCnxn:[1299] - Session establishment complete on server z-3.bi-kafka.d797mt.c8.kafka.us-west-2.amazonaws.com/192.168.22.252:2181, sessionid = 0x200000aec62004e, negotiated timeout = 30000
[INFO] 2021-12-22 07:02:42.494 org.apache.curator.framework.state.ConnectionStateManager:[251] - State change: RECONNECTED
[INFO] 2021-12-22 07:02:42.502 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[437] - master node : 192.168.25.200:15678 unRegistry to register center.
[INFO] 2021-12-22 07:02:42.502 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[286] - master node : /nodes/master/192.168.25.200:15678 down.
[INFO] 2021-12-22 07:02:42.502 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[160] - MASTER node deleted : /nodes/master/192.168.25.200:15678
[INFO] 2021-12-22 07:02:42.502 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[439] - heartbeat executor shutdown
[INFO] 2021-12-22 07:02:42.505 org.apache.curator.framework.imps.CuratorFrameworkImpl:[955] - backgroundOperationsLoop exiting
[INFO] 2021-12-22 07:02:42.505 org.apache.zookeeper.ClientCnxn:[522] - EventThread shut down for session: 0x200000aec62004e
[INFO] 2021-12-22 07:02:42.505 org.apache.zookeeper.ZooKeeper:[693] - Session: 0x200000aec62004e closed
[INFO] 2021-12-22 07:02:42.506 org.quartz.core.QuartzScheduler:[666] - Scheduler DolphinScheduler_$_ip-192-168-25-2001639723938667 shutting down.
[INFO] 2021-12-22 07:02:42.506 org.quartz.core.QuartzScheduler:[585] - Scheduler DolphinScheduler_$_ip-192-168-25-2001639723938667 paused.
[INFO] 2021-12-22 07:02:42.509 com.zaxxer.hikari.HikariDataSource:[350] - DolphinScheduler - Shutdown initiated...
[INFO] 2021-12-22 07:02:42.512 com.zaxxer.hikari.HikariDataSource:[352] - DolphinScheduler - Shutdown completed.
[INFO] 2021-12-22 07:02:42.513 org.quartz.core.QuartzScheduler:[740] - Scheduler DolphinScheduler_$_ip-192-168-25-2001639723938667 shutdown complete.
[INFO] 2021-12-22 07:02:42.513 org.apache.dolphinscheduler.service.quartz.QuartzExecutors:[210] - Quartz service stopped, and halt all tasks
[INFO] 2021-12-22 07:02:42.513 org.apache.dolphinscheduler.server.master.MasterServer:[214] - Quartz service stopped
[INFO] 2021-12-22 07:02:42.515 org.quartz.core.QuartzScheduler:[585] - Scheduler quartzScheduler_$_NON_CLUSTERED paused.
[INFO] 2021-12-22 07:02:42.517 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
[INFO] 2021-12-22 07:02:42.517 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed
[INFO] 2021-12-22 07:02:42.517 org.springframework.scheduling.quartz.SchedulerFactoryBean:[845] - Shutting down Quartz Scheduler
[INFO] 2021-12-22 07:02:42.517 org.quartz.core.QuartzScheduler:[666] - Scheduler quartzScheduler_$_NON_CLUSTERED shutting down.
[INFO] 2021-12-22 07:02:42.517 org.quartz.core.QuartzScheduler:[585] - Scheduler quartzScheduler_$_NON_CLUSTERED paused.
[INFO] 2021-12-22 07:02:42.518 org.quartz.core.QuartzScheduler:[740] - Scheduler quartzScheduler_$_NON_CLUSTERED shutdown complete.
[INFO] 2021-12-22 07:02:42.520 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[139] - StateEventResponseWorker stopped
[WARN] 2021-12-22 07:02:42.522 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[115] - persist task error
java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService$StateEventResponseWorker.run(StateEventResponseService.java:112)
[INFO] 2021-12-22 07:02:42.522 org.apache.dolphinscheduler.server.master.processor.queue.StateEventResponseService:[120] - StateEventResponseWorker stopped
[ERROR] 2021-12-22 07:02:50.169 org.apache.curator.framework.imps.CuratorFrameworkImpl:[703] - Background exception was not retry-able or retry gave up
java.lang.IllegalStateException: Client is not started
at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:507)
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:162)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:969)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:638)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
at org.apache.curator.framework.imps.FindAndDeleteProtectedNodeInBackground.execute(FindAndDeleteProtectedNodeInBackground.java:60)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:619)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:203)
at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:237)
at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.removeNodePath(MasterRegistryClient.java:163)
at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.handleMasterEvent(MasterRegistryDataListener.java:66)
at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.notify(MasterRegistryDataListener.java:52)
at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:127)
at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760)
at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754)
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92)
at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753)
at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75)
at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
What you expected to happen
failover success.
How to reproduce
zookeeper failover
Anything else
No response
Version
2.0.1
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Waiting for replyWaiting for replyWaiting for replybugSomething isn't workingSomething isn't working