-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Feature Request
I do see that tikv-server handles SIGTERM to properly shutdown itself however, I see one major issue is that it doesn't inform the PD server that it is exiting before shutting down, as a result, what happens is that when the new queries are fired to TiDB server they still don't know this server is dead and they end up trying to connect with the dead TiKV pod, and after a timeout, they realize it's dead and then PD eventually understand that and start leader eviction from the dead TiKV pod. All this happens within a few seconds but still, queries during this small period get affected and latency increases. What I would rather want is that tikv-server itself informs PD during a graceful shutdown to add an evict-leader scheduler so that PD removes all the leaders before the pod is dead.
Is your feature request related to a problem? Please describe:
This issue comes up when there is a centrally managed k8s cluster that holds not only TiDB clusters and TiKV uses local attached disks. When they start k8s API server upgrades the nodes go down for 5m, PD is unaware of this downtime and until it figures out the pod is down it continues to send queries to dead TiKV. Which results in latency spikes for ~5s duration.
Describe the feature you'd like:
When kubectl drain is run on the node, it sends SIGTERM to the TiKV pod, I would like the tikv-server informs PD that it is going down and remove the leaders before going down
Describe alternatives you've considered:
We have already gone through https://docs.pingcap.com/tidb-in-kubernetes/stable/maintain-a-kubernetes-node#if-the-node-storage-can-be-automatically-migrated-1 but this requires manual intervention before node is going down, which is not possible as the k8s clusters are centrally managed.