-
Notifications
You must be signed in to change notification settings - Fork 100
BR won't clean up the environment when exit by SIGTERM #557
Description
Please answer these questions before submitting your issue. Thanks!
- What did you do?
If possible, provide a recipe for reproducing the error.
- start BR (restore or backcup with
--remove-schedulers) - waiting for the progress bar present, then press ctrl + c
-
What did you expect to see?
The cluster config changed by BR should be undone, since SIGTERM allows us to gracefully stop. -
What did you see instead?
The cluster has stuck in the config that BR has set. (For current master, PD schedulers could be reset due to scheduler: use pause instead of remove schedulers #551 )
- What version of BR and TiDB/TiKV/PD are you using?
v4.0.7
Note:
We listen to signals here:
Lines 34 to 39 in d2d5bba
| case syscall.SIGTERM: | |
| cancel() | |
| os.Exit(0) | |
| default: | |
| cancel() | |
| os.Exit(1) |
Canceling the context could make other goroutines eventually exit and clean up, but we leave no time for them.
Add a remove those time.Sleep(30 * time.Second)os.Exits could help. But there are still some problems:
Lines 222 to 227 in d2d5bba
| restore, e := mgr.RemoveSchedulers(ctx) | |
| defer func() { | |
| if restoreE := restore(ctx); restoreE != nil { | |
| log.Warn("failed to restore removed schedulers, you may need to restore them manually", zap.Error(restoreE)) | |
| } | |
| }() |
We use the global context to do the cleanup tasks, which will always fail if the outer context is canceled. We should change it to a new context with a timeout, the timeout could be the same as the sleep time before stopping.
